A Thesis

entitled

The Maize TFome 2.0: Genomic Analysis of Repertoire

by

Rachael Wasikowski

Submitted to the Graduate Faculty as partial fulfillment of the requirements for the

Master’s Degree in

Bioinformatics, and Genomics

______John Gray, Committee Chair

______Scott Leisner, Committee Member

______Alexei Fedorov, Committee Member

______Dr. Amanda Bryant-Friedrich, Dean College of Graduate Studies

The University of Toledo

May 2018

Copyright 2018, Rachael Wasikowski

This document is copyrighted material. Under copyright law, no parts of this document may be reproduced without the expressed permission of the author. An Abstract of

The Maize TFome 2.0: Genomic Analysis of Transcription Factor Repertoire

by

Rachael Wasikowski

Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Master’s Degree in , Proteomics and Genomics

The University of Toledo

May 2018

The complete transcription factor (TF) repertoire of an organism is a very useful resource in the study of regulomics. A physical clone collection of each TF’s open reading frame

(ORF) comprises a TFome library and has the potential to experimentally place transcription factors within a regulatory network. The Maize TFome 2.0 is a correction and expansion of the transcription factor repertoire previously identified in

Maize TFome 1.0 (Burdo et al., 2014). The predicted gene models for uncloned transcription factors and coregulators were analyzed using RNA-seq data from 42 different tissues and growth conditions. Models were ranked for future cloning efforts based upon the level of read support. In the future, all well-supported TFs and CRs will be cloned and added to the maize TFome 2.0 collection. As part of expanding the maize

TFome, the second largest family of maize TFs named the Apetela2/Ethylene Response

Element Binding (AP2/EREBP) family was analyzed using synteny and coexpression data. As a result of a whole duplication event in maize, it was found that 54% of the AP2/EREBP are duplicated. Synteny data and an HMM scan of the new Maize genome revealed 24 new members of the AP2/EREBP family.

iii

Members of the AP2/EREBP family were sorted into four subfamilies based on the domains present and the sequence of their DNA binding motif. Some syntelog pairs were highly coexpressed suggesting functional redundancy while others are considered to have diverged in function based on their different levels of expression during plant development.

iv

Acknowledgements

I would like to thank Todd Lenz, Patrick Dusza, and Alex Fazekas for assisting with the gene model rankings and RNA-seq data extraction. Their efforts helped keep this project running smoothly.

v

Table of Contents

Abstract ...... iii

Acknowledgements ...... v

Table of Contents ...... vi

List of Tables ...... ix

List of Figures ...... x

List of Abbreviations ...... xi

1 Analysis of Uncloned Transcription Factors and CoRegulators in Maize TFome

1.1 Introduction ...... 1

1.1.1 The Maize Regulome ...... 1

1.1.2 General Utility of TFome Collections ...... 2

1.1.3 TFome Studies in Plants and other Organisms ...... 4

1.1.4 Study Objectives ...... 6

1.2 Materials and Methods ...... 7

1.2.1 Collection and Refinement of the Maize TFome RNA-seq Data ....7

1.2.2 Interpretation of RNA-seq Data and Ranking of Gene Model

Quality ...... 8

1.3 Results ...... 9

1.3.1 Census of Uncloned TFs and CRs in the Maize Genome ...... 9

1.3.2 Evaluation of Gene Model Quality for Uncloned TFs and CRs in

the maize genome...... 13

1.4 Discussion and conclusions ...... 19

vi

2 Analysis of the AP2/EREBP Transcription Factor Family in Maize ...... 24

2.1 Introduction ...... 24

2.1.1 Overview of the AP2/EREBP TF Family in Maize ...... 24

2.1.2 Known Functions of AP2/EREPB Genes ...... 26

2.1.3 AP2/EREBP Subfamily RAV ...... 28

2.1.4 The Influence of Whole Genome Duplication (WGD) on maize TF

repertoire ...... 29

2.1.5 The Effect of Redundant Syntenic Genes ...... 30

2.1.5 Study Objectives ...... 31

2.2 Materials and Methods ...... 32

2.2.1 Detection of Synteny Amongst Maize AP2/EREBP Family

Members ...... 32

2.2.2 Coexpression Analysis of the AP2/EREBP Family ...... 33

2.2.3 Phylogenetic Analysis of the AP2 domain ...... 34

2.3 Results ...... 35

2.3.1 Survey of EREBP Gene Structures in Maize ...... 35

2.3.2 Phylogenetic Analysis of the EREBP family in maize ...... 36

2.3.3 Detection of Synteny amongst Maize EREBP family members .. 38

2.3.4 Coexpression Analysis of Maize EREBP family members ...... 41

2.4 Discussion and conclusions………………………………………………….47

References ...... 53

A Table A1: List of Uncloned TF Genes and Gene Model Rankings ...... 62

B Table A2: List of Uncloned CR Genes and Gene Model Rankings ...... 82

vii

C Table A.3: List of Syntelogs of EREBP genes in Maize ...... 86

D Table A.4: List of EREB Genes with likely deletions of syntelog ...... 93

E CIRCOS Ribbon file ...... 95

F CIRCOS Gene coordinates file ...... 97

G CIRCOS Ideogram file ...... 99

H CIRCOS Configuration file ...... 100

I R script for coexpression analysis ...... 102

viii

List of Tables

1.1 Tandem Duplications of Uncloned Transcription Factors ...... 10

2.1 List of Maize AP2/EREBP Genes with known association ...... 27

2.2 Summary of Self-Synteny in Maize Genome V4 ...... 39

2.3 List of Syntenic EREB Gene Pairs in Maize ...... 40

2.4 List of syntenic EREB family members that are highly coexpressed ...... 42

2.5 List of syntenic EREB family members that are not coexpressed ...... 44

2.6 List of non-syntenic EREB family members that are highly coexpressed ...... 46

A.1 List of Uncloned TF Genes and Gene Model Rankings ...... 62

A.2 List of Uncloned CR Genes and Gene Model Rankings ...... 82

A.3 List of Syntelogs of EREBP genes in Maize ...... 86

A.4 List of EREB Genes with likely deletions of syntelog ...... 93

ix

List of Figures

1-1 Bar graph of the total number of maize transcription factor V4 gene models and

the number of previously cloned genes within each gene family...... 12

1-2 Bar graph of the total number of maize CoRegulator (CR) V4 gene models and

the number of previously cloned genes within each gene family...... 13

1-3 Analysis pipeline for mapping of RNA-Seq reads to maize genome V4.37 ...... 14

1-4 Ranking System for Maize Gene Model Support based on RNA-Seq Data...... 15

1-5 Summary of Ranked Maize TF and CR Gene Model Support based on RNA-Seq

Data...... 17

1-6 Example of V3 models not corresponding to gene models from Maize genome V4

that varied in RNA-seq read mapping support...... 18

2-1 Relationship between AP2 and ABI3 domain containing proteins of Arabidopsis.25

2-2 Structures of the AP2 and B3 domains found in EREBP family members ...... 25

2-3 Whole Genome Duplication (WGD) Event in Zea mays lineage...... 29

2-4 Example of Syntenic Region detected between Maize and Sorghum using SyMAP

4.2...... 31

2-5 Summary of AP2/EREBP family Gene Structures in Maize...... 36

2-6 Phylogenetic Tree of 208 EREBP family members in Maize...... 37

2-7 Schematic of syntenic ZmEREBP family members that are highly coexpressed .43

x

List of Abbreviations

AP2 At BLAST Basic Local Alignment Search Tool C-terminus carboxyl terminus of CR CoRegulator CRE cis-regulatory element DBD DNA-binding domain DNA deoxyribonucleic acid DREB Drought response element binding protein e.g. example EREBP ethylene-responsive element binding protein ERF ethylene response factor Fig Figure GRN Gene Regulatory Network HMM Hidden Markov Model mRNA messenger RNA Mya million years ago N-terminus amino terminus of protein ORF open reading frame Os Oryza sativa RNA ribonucleic acid RAV Related to ABI3-VIP Sb Sorghum bicolor TF transcription factor Zm Zea mays

xi

Chapter 1

Analysis of Uncloned Transcription Factors and

CoRegulators in Maize TFome

1.1 Introduction

1.1.1 The Maize Regulome

Transcription factors (TFs) are responsible for the regulation of gene expression in all living things. TFs activate or repress genes that control crucial processes such as proliferation, cell differentiation, and stress responses. They are expressed temporally and spatially within different tissues as a response to developmental signals or external stimuli, such as light or heat (Seaver et al., 2015). Some TFs can be expressed in a tissue or temporal specific fashion, whereas others are expressed almost globally or constitutively in an organism. A protein is considered a transcription factor if it can recognize cis-regulatory elements and induce or repress the transcription of a target gene

(Burdo et al., 2014). Coregulators (CRs) are proteins that usually act in concert with TFs to fine-tune gene regulation, but usually do not bind DNA directly. For the purposes of this study both TFs and CRs in the maize genome have been studied as the trans-acting components of the maize regulome. A TFome is a collection of all the known TFs and

1

CRs present within a species. Such a collection is most valuable when the predicted gene models are supported by a physical library of the cloned genes. In the case of the maize genome a physical library of approximately 70% of the regulatory genes has been constructed and made publicly available (Burdo et al., 2014; Gray et al., 2015). In addition, an online database entitled GRASSIUS (www.grassius.org) has been developed to summarize the predicted TFome of maize and other graminaceous relatives namely

Brachypodium, rice, sorghum, and sugarcane. The GRASSIUS server currently contains all predicted maize transcription factor genes as of the Maize B73 Genome release version 3. There are plans to update all gene model names to version 4 of the genome

(Yilmaz et al., 2009). Within GRASSIUS, the GRASSTFDB and the GRASSCoRegDB both permit the user to extract the gene name, canonical model, and cloned DNA sequence (if available). Information relating to clones belonging to the maize TFome 1.0 can accessed and downloaded either separately or in their entirety.

1.1.2 General Utility of TFome Collections

The availability of a TFome can greatly accelerate investigation of gene regulatory networks (GRNs) within a species by facilitating global screens. Both TFome and

ORFeome (i.e. a collection of all coding sequences in a genome) efforts are underway for only a few other model organisms. For example, the human ORFeome V8.1 contains over 11,000 gateway clones of possible ORFs (Wiemann et al., 2016). The C. elegans and D. melanogaster ORFeomes cover approximately 60% of their respective with about 11,000 ORFs in each collection (http://flybi.hms.harvard.edu) (Lamesch et al.,

2004). The yeast ORFeome is comprised of about 4910 clones covering about 91% of the genome (Matsuyama et al., 2006). Two TFomes have been developed for the model dicot 2

plant Arabidopsis (Breton et al., 2016; Riechmann et al., 2000). Such ORFeomes and

TFomes have been used in genome wide screens (Weston et al., 2017; Yang et al., 2017).

Established techniques such as yeast-one-hybrid (Y1H) and yeast-two hybrid (Y2H) can help elucidate which TFs bind to a particular DNA sequence, or which CRs interact with a given TF. Because some TFs are expressed only under very particular conditions, such as high temperature stress or within an embryo at 14 days old, a complete TF library helps to assure that interactions of rare transcription factors are not missed in these screens. Once the interactions between TFs, CRs, and their target cis-elements are discovered, they can be used to establish gene regulatory networks that permit the dissection of complex regulatory dynamics. Transcription factors can repress or induce the transcription of a gene, depending on a stimulus. Target gene roles can be narrowed by their placement within a GRN. This in turn can help establish pathways of an organism’s response to a stimulus. For example, drought can promote the expression of

DBF1 (ZmEREB204), which (along with ) activates the expression of rab17.

In non-drought conditions, DBF2 is expressed, and rab17 is negatively regulated. (Huang et al., 2016; Kizis and Pages, 2002). When a physical collection is not made, predicted

TFomes can be established or refined utilizing ESTs and RNA-seq data. Utilizing bioinformatics to predict transcription factors can accelerate the discovery of rare TFs.

For example, bioinformatics approaches have revealed up to roughly 150 new transcription factors in the C. elegans genome from a list of previously unsupported gene models. (Wei et al., 2005).

3

1.1.3 TFome Studies in Plants and other Organisms

Only a few TFome efforts have been reported to date in plant species (Chai et al., 2015;

Nishiyama et al., 2014). In the model plant Arabidopsis thaliana, there are an estimated

2492 TFs and recently a TFome collection comprised of about 2000 clones has been reported and used in a Y1H screen (Breton et al., 2016; Pruneda-Paz et al., 2014). The number of TF families present within the Arabidopsis genome appears to be largely conserved in the plant kingdom without much variation in the number within each family.

For example, the AP2-EREBP family was predicted to contain 145 members in

Arabidopsis, and 212 in maize (Burdo et al., 2014; Riechmann et al., 2000). All of the plant-specific transcription factors are present within both species due to their relatively recent divergence and conserved functions. Oryza satvia, Arabidopsis, and Maize all possess a set of plant-specific genes known as “DREB” or “DBF” that regulate drought response (Kizis and Pages, 2002). These TFs play an important role in protecting the plant during dehydration conditions (Yang et al., 2015). In soybean a targeted TFome was constructed composed of just 207 stress related TFs (Chai et al., 2015; Wang et al.,

2010). Plant TFome/ORFeome efforts utilize bioinformatics to survey TF repertoires before attempting cloning. A TFome online database known as SoyKB

(http://soykb.org/TF_ORFeome/) contains information on all 5,671 predicted Soybean

TFs. This interface allows the user to access the coding sequence and view the gene model of any soybean TF (Chai et al., 2015; Wang et al., 2010). In contrast, the sugarcane ORFeome contains a total of predicted 937 transcription factors which is likely underestimated several fold as it was generated by analyzing cDNA libraries and not a complete genome (Nishiyama et al., 2014). Outside of plants, a significant effort has been 4

made to generate a collection of clones for the human TFome. The Human genome contains approximately 1600 TFs which is about half that of the maize genome.

However, the maize genome underwent a whole genome duplication event and thus there are many duplicate TFs. The repertoire of TF families in humans differs from that of plants, for example there is an enrichment of TFs with a C2H2-ZF domain, with over 700 in the human TFome (Lambert et al., 2018). Generation of a complete human TFome also suffers from the same major problem as plant TFomes namely poor genome quality.

(Perdomo-Sabogal et al., 2014) When gaps in a genome occur, a canonical gene model cannot be accurately supported. The lack of a complete open reading frame makes it more difficult to predict if a TF is present, or if it is experimentally supported. In addition to poor genome quality, the human TFome and the Maize TFome both contain TFs that do not properly fit into a transcription factor family (Lambert et al., 2018). This is caused by the presence of unusual protein domains or an atypical gene model. TFs that do not fit into a family are put in a category known as Orphans. The aim of both TFomes is to sort every transcription factor into a family of related proteins. Comparing TF repertoires across species can also reveal interesting mutations that may have contributed to speciation. For example, the human FOXP2 TF only differs from the chimpanzee version by two amino acids, but their target genes are different. The human FOXP2 gene is known to have contributed to the development of advanced speech (Perdomo-Sabogal et al., 2014).

5

1.1.4 Study Objectives

The goals of this study are as follows:

The overall goal of this part of the study is to re-evaluate the maize genome with regards to TF models that have not yet been added to the maize TFome collection. A future practical aim beyond this study is to expand the Maize TFome library with experimentally supported genes. The advent of an updated maize genome (V4) (Jiao et al., 2017), and dozens of RNA-Seq data sets will permit the uncloned genes to be evaluated for model accuracy as well as tissue and temporal expression patterns. In addition, a few TF gene families have been added or reevaluated in recent years. Three new TF families (NF-X1, TIFFY, and RWP-RK) and two new subfamilies (AP2/EREBP-

RAV and C2C2-LSD) have been added. To accomplish this survey the following bioinformatical analysis was performed.

1: Publicly available RNA-seq datasets from the qTELLER database were aligned to the

Maize Genome version 4 using linux packages HiSat2 and samtools. The output file permitted for the visualization of the RNA-seq data within the Integrated Genomics

Viewer.

2: All previously uncloned gene models from The Maize TFome were reexamined for their RNA-seq experimental support and gene model reliability. Gene models were ranked by RNA-seq read support and adherence to the canonical maize gene model.

3: Genes with strong RNA-seq read support and strong canonical gene model support were listed as potential candidates for cloning and addition to the maize TFome.

6

1.2 Materials and Methods

1.2.1 Collection and Refinement of the Maize TFome RNA-seq Data

RNA-seq dataset reference numbers from a collection of expression studies in maize were first obtained from the qTELLER database (www.qteller.com). The raw datasets were downloaded in the FASTQ format from the NCBI Sequence Read Archive (SRA)

FTP server www.ncbi.nlm.nih.gov/sra (Leinonen et al., 2011). The data included sixty- six RNA-seq datasets which were derived from nine separate studies (Bolduc et al., 2012;

Chang et al., 2012; Chettoor et al., 2014; Davidson et al., 2011; Johnston et al., 2014;

Kakumanu et al., 2012; Li et al., 2010; Wang et al., 2009; Waters et al., 2011). The gene expression data was derived from sixty-six tissues, including environmentally stressed tissues and resulted in a 721.41GB file size before alignment. The maize reference V4 . 37 genome and maize general feature format (GFF) files were downloaded from the

EnsemblPlants server (http://plants.ensembl.org/Zea_mays/Info/Index). The pipeline for processing RNA-seq files was as follows (Fig. 1). The Linux package HiSat2 (V2.1.0) was utilized to align the RNA-seq reads to the maize genome (V4, rel 37) (Kim et al.,

2015). HiSat2 component lzma was disabled prior to analysis and replaced with package xz. HiSat2 detected splice sites in the maize genome by the information provided in the maize GFF file. Next, each RNA-seq dataset was aligned individually to the maize genome. An unsorted BAM file was generated for each alignment that would require further refinement. Linux package samtools version 1.5 was then used to generate a sorted BAM file from the HiSat2 output. The sixty-six sorted BAM files were then combined into eight groups to make a larger BAM file, not exceeding 35GB in each group. The grouped BAM files were sorted again and an index file was generated to 7

visualize reads. Lastly, a list file was created in order to execute all of the BAM and BAI files simultaneously. The Integrated genome viewer (IGV) was utilized to visualize the gene expression data. To display reads matched to gene models the maize GFF file for genome V4 file was loaded into IGV, followed by the combined list file of all aligned

RNA-seq data.

1.2.2 Interpretation of RNA-seq Data and Ranking of Gene Model Quality

Following loading of RNA-seq reads into IGV, individual maize V4 gene models could be analyzed based on their read support. Models were ranked for their read coverage and support for the splicing pattern of the current version 4 gene model (Fig. 1.4). The highest-ranking gene models received a score of “5” for full read coverage of the entire model and total adherence to the annotated gene model. A model that received a score of

“4” signifies that the model had nearly full coverage, yet it was flawed in one or two ways. Typically, a score of 4 indicated a gap in the maize genome or read support for an extra intron or exon that is not present in the annotated gene model. A score of “3” indicated that the model had support for some of the annotated exons and introns, but not all of them. A score of “2” indicated that there were few reads present, or they did not support the annotated gene model. A model ranking “1” had less than ten reads total, whether they supported the model or not. Models lacking any reads were given a ranking of “0”.

8

1.3 Results

1.3.1 Census of Uncloned TFS and CRs in the Maize Genome

There were 3031 predicted transcription factors and coregulators in the Maize genome

V3. A total of 2016 of these genes were cloned and became a part of the initial Maize

TFome 1.0 collection (Burdo et al., 2014). To move towards a more complete TFome 2.0 collection, the goal of this study was to analyze the quality of the gene models representing the remaining uncloned genes. Often, these genes were not previously cloned because they had poor RNA-seq support or had a flawed gene model in V3 of the maize genome. These models had been selected based on the presence or absence of motifs, defined as Hidden Markov models in the database (pfam.xfam.org), and that are considered typical of each TF or CR family. A survey of the original set of uncloned genes showed that it contained 743 transcription factors and 130 coregulators.

In order to explore the maize V4 genome models it was necessary to first convert gene model names from their genome V3 names (GRMZM2GXXXXXX format) to genome V4 names (Zm00001dXXXXXX format) using the conversion available from the MaizeGDB server. Only a few (n = 40) gene V3 gene models were not found in the V4 genome and an additional search was needed to find new models within the V4 genome. Care was taken to note changes in gene coordinates in the updated genome. One advantage of the new gene model names is that they are numbered sequentially along the chromosome so that it is easy to detect tandem repeats. In Table 1.1 we could find tandem repeats for some uncloned TFs in the maize genome.

9

Table 1.1: Tandem Duplications of Uncloned Transcription Factors Maximally Suggested Model Expressed Name V4 Gene Name V3 Gene Name Rank FPKM1 Tissue2 ZmHB33 Zm00001d033378 GRMZM2G011588 5 81.06 mature silk ZmHB105 Zm00001d033379 GRMZM2G163761 5 3.42 Shoot apex 25 DAP ZmGLK12 Zm00001d036802 GRMZM2G401835 3 0.55 endosperm ZmOrphan17 Zm00001d036803 GRMZM2G401821 2 2.25 Embryo 14 DAP ZmHB47 Zm00001d038811 GRMZM2G474656 5 5.94 Ear primordia ZmHB92 Zm00001d038812 GRMZM2G094935 5 14.93 P7 blade L12 Endosperm ZmCPP13 Zm00001d041444 AC203865.3_FG001 4 11.99 14DAP post emergence ZmCPP10 Zm00001d041445 GRMZM2G701689 5 125.64 cob Leaf Meristems Drought ZmGRAS72 Zm00001d048602 GRMZM2G073805 5 2.28 stressed ZmGRAS26 Zm00001d048603 GRMZM5G868355 4 6.62 Seedling shoots Leaf Meristems Drought ZmGRAS34 Zm00001d048604 GRMZM2G163427 5 6.40 stressed Developing ZmOrphan101 Zm00001d048989 GRMZM2G407605 1 0.31 Leaf ZmOrphan97 Zm00001d048990 GRMZM2G407623 1 0.71 Shoot field pre emergence ZmSBP1 Zm00001d049822 GRMZM2G101511 5 16.33 cob post emergence ZmSBP32 Zm00001d049824 AC233751.1 FG002 5 14.97 cob ZmJMJ6 Zm00001d051959 GRMZM2G107109 5 1.00 P7 ligule L11 ZmJMJ9 Zm00001d051961 GRMZM2G140524 5 13.32 P7 sheath L12 Maize b1 ZmJMJ2 Zm00001d051964 GRMZM2G171163 4 0.79 control Endosperm ZmJMJ4 Zm00001d051965 GRMZM2G027075 5 3.37 14DAP 1 FPKM = Fragments Per Kilobase Million in tissue in which highest expression of this gene was observed 2 Highest Expressed Tissue = maize tissue for which FPKM value is greatest amongst 66 RNA-Seq datasets used in this study. DAP = days after pollination

A scan of the V4 genome was performed using updated HMMs for each family (pFAM release 31) and the HMMER (v3) software set with a threshold gathering score greater than the reported HMM for domains that are found in the PFAM database (Eric Maina personal communication). This scan revealed a possible 465 additional transcription

10

factor candidates and most families of transcription factors gained new potential candidates. New sub-families have been added to accommodate the new potential genes.

These include: NF-X1, C2C2-LSD, and TIFFY. In addition, new HMMs were built for a few families (HRT, Trihelix, ULT, VOZ, NOZZLE, LUG, G2-like, Alfin-like, CCAAT-

Dr1, NF-YB, NF-YC, VARL, and STAT) (Eric Maina personal communication). No new genes were added to the count of coregulators. In total 64 TF families were identified with a total of 3163 TFs and 329 CRs (Fig. 1). Of these the largest group of uncloned genes (n = 167) was within the Orphans family (n=359) which includes models with one or more but not all conserved features of a given family. The largest increase was in the

Zn coordinating C2H2 family of TFs from a total of 10 to 152 and so this now ranks as the fifth largest TF family in maize. It appears that the number was greatly underestimated in V3 of the genome and so it represents a significant gap in the maize

TFome V1.0. A total of 24 CR families were found in the maize genome with a total number of 329 gene models (Fig. 2). Within the CRs, the families that are most incomplete in terms of cloning are the SNF2, SWI/SNF-BAF60b, Rcd1-like, and DDT families respectively. In all it was estimated that there are currently 1333 and 201 uncloned TFS and CRs respectively in the maize genome.

11

Total Number of Genes in Gene Family

Number of Cloned Genes in Gene Family

TF Family TF

Number of Genes

Figure 1-1. Bar graph of the total number of maize transcription factor V4 gene models and the number of previously cloned genes within each gene family. The total number of genes includes putative models from V4 of the maize genome that were not predicted in V3 of the genome, which are displayed in green. The number of cloned genes are represented in yellow 12

Total Number of Genes in Gene Family

Number of Cloned Genes

in Gene Family CR Family

Number of Genes

Figure 1-2. Bar graph of the total number of maize CoRegulator (CR) V4 gene models and the number of previously cloned genes within each gene family. The total number of genes includes putative models from V4 of the maize genome that were not predicted in V3 of the genome, which are displayed in green. The number of cloned genes are represented in yellow

1.3.2 Evaluation of Gene Model Quality for Uncloned TFs and CRs in the

maize genome.

Next it was determined if the predicted models for the uncloned TFs and CRs were sufficiently reliable for cloning either by RT-PCR or gene synthesis. Since the maize

TFome V1.0 collection was constructed, a large increase in the number of publicly available RNA-Seq datasets has occurred. In the absence of full length cDNA (flcDNA)

13

information, these datasets are ideal for determining support for the predicted exonic sequences as well as the intron and exon splicing borders by mapping the reads back to

V4 of the maize genome. For this objective, 66 RNA-Seq datasets from 9 separate gene expression studies in maize were downloaded from the qTELLER database (Materials and Methods). The reads were mapped to the maize genome V4 using an analytical pipeline which is outlined in Fig. 3 and Materials and Methods.

Raw RNA-seq Reads qTeller.com Figure 1-3. Analysis pipeline for mapping Alignment to of RNA-Seq reads to maize genome V4.37. Maize genome See text for details. V 4.37 HiSat2 Version 2.1.0 The mapped reads were viewed using the

Unsorted Integrated Genome Viewer (IGV) which BAM file generated permits the read support for each gene model Samtools Version 1.5 to be interrogated. Gene models were ranked Sorted BAM & BAI files according to the density of RNA-seq reads generated and how well the predicted gene model was IGV Version 2.3.93 supported (Fig. 1-4 and Table A.1 and A.2).

Gene models that had full read coverage of

Visible, Aligned every exon and intron received a score of 5. RNA-seq Reads There were 344 transcription factors and 83 coregulators that met these criteria. These high-ranking genes will be cloned and enter the

Maize TFome collection. A ranking score of 4 indicates that the gene model had nearly

100% read support, but a section of the CDS was missing. For example, genes lacking an

14

A B Ranking 0- Zm00001d049309 Ranking 1- Zm00001d039700

C D Ranking 2- Ranking 3-

Zm00001d047752 Zm00001d033508

E Ranking 4- Zm00001d005740 F Ranking 5- Zm00001d034298

Figure 1-4. Ranking System for Maize Gene Model Support based on RNA-Seq Data. A-F. Examples of gene models from maize genome V4 that varied in RNA-seq read mapping support. A: Gene model ranked 0 with no read support, B: Gene model ranked 1 with less than 20% read support, C: Gene model ranked 2 with <50% read support and intron splicing patterns not supported, D: Gene Model with >50% read support and partial splice pattern support, E: Gene model with almost full read support but gap exists in genome, E: Full read support including splice patterns and no genome gaps.

15

exon or containing a gap in the genome would receive a score of 4. Sometimes, a lack of read support for an exon or intron can indicate the presence of an alternative splice site.

There were 183 TFs and 15 CRs that had a ranking of 4, which could also be placed on the cloning list to be cloned by RT-PCR but not gene synthesis. Poorly supported models received a score of 0-3. Models with a score of 3 have read support for some exons and introns, but not most. Genes with multiple large gaps in the CDS will also receive a ranking of 3. There were 84 TFs and 13 CRs ranked as a 3. Genes with a score of 2 have little read support and the reads often do not support the presence of the canonical gene model. 45 TFs and 4 CRs had poor support for the gene model and received a ranking of

2. Genes ranking as a 1 had as few as one read supporting the model. There were 48 TFs and 4 CRs that had this level of read support. Lastly, models with no reads supporting them at all received a score of 0. All coregulators had some level of read support. 35 TFs had no RNA-seq support whatsoever. Overall, it was estimated that 50% and 69% of the uncloned TFs and CRs respectively had full support for the predicted gene models and could be cloned by either gene synthesis or RT-PCR (Fig 1-5A). If models with a level 4 designation are also included then these numbers rose to 75% and 81% respectively. A general observation was that gene models with low support (rank 0-2) also exhibited lower maximal expression levels in the plant compared to gene models with strong read support (rank 4-5) (Fig. 1-5B). It was concluded that a majority of the remaining uncloned genes had strong gene model support and could be readily cloned for addition to the maize TFome V2.0. The models ranked 0 through 3, however were deemed sufficiently unreliable for cloning at this time. These models have not been sufficiently supported with experimental evidence in the V4 of the genome. An example of this is the 16

A

No. Gene of Models

0 1 2 3 4 5

Gene Model Rank

B Expression LevelExpression (FPKM)

Highest

Gene Model Rank

Figure 1-5. Summary of Ranked Maize TF and CR Gene Model Support based on RNA-Seq Data. A. Histogram of gene model rank based on RNA-Seq support versus the number of gene models with that rank. B. Scatterplot of genes with an assigned gene model rank versus the highest expression level (FPKM) of that gene reported in an RNA- seq experiment.

Orphan gene GRMZM2G068504 for which a corresponding standalone model was not predicted in V4 of the genome. However close examination of where this gene is found in the V4 genome reveals it to be incorporated into a longer model named

Zm00001d038407 (Fig. 1-6A). 17

A GRMZM2G068504

Zm00001d038407

B

C

Figure 1-6. Example of V3 models not corresponding to gene models from Maize genome V4 that varied in RNA-seq read mapping support. A: Orphan model GRMZM2G068504 (V3) overlaps with the end of Zm00001d038407 from V4 of genome. The new model encodes a protein named “Protein NETWORKED 2D” which is part of a complex that anchors the cellular actin cytoskeleton to membranes. B: This gene was likely assigned a putative TF function because it contains a CCT motif (pfam06203) found in TFs such as Nuclear transcription factor Y subunit B-8 (AQK99543.1). The domain (pfam13966) would appear to be a zinc- binding region of a putative reverse transcriptase indicating the possible presence of a transposon in this region of the genome (with very low read support). C: There are no existing transcripts that connect the predicted 5’ and 3’ segments of the gene. 18

The N terminal region of the predicted protein is 99% identical to a protein named

Protein NETWORKED 2D which is part of complex that anchors the cellular actin cytoskeleton to the membrane. (Deeks et al., 2012). It was likely assigned a putative TF function because it contains a CCT motif (pfam06203) found in TFs such as Nuclear transcription factor Y subunit B-8 (AQK99543.1) (Fig. 1-6B). The nearby zinc finger domain (pfam13966) would appear to be a zinc-binding region of a putative reverse transcriptase indicating the possible presence of a transposon in this region of the genome

(with very low read support). Although the Zm00001d038407 is predicted to have several alternative transcripts, (Fig. 1-6C) none of these predict the N-terminal and C-terminal proteins to be translated as one protein. It is likely that these are two separate genes but the reads do not clearly support either the V3 or V4 models.

1.4 Discussion and conclusions

One of the main goals of this study was to re-evaluate the maize TFome with particular emphasis on the uncloned members that were omitted from the released clone collection, based on version 3 of the maize reference genome. This objective was accomplished by taking advantage of an updated reference genome (V4) to revisit the predicted TFome repertoire and by availing of large RNA-Seq datasets to refine the uncloned gene models.

In V3 of the reference genome, gene models were not numbered consecutively but are in the V4 genome. In converting the gene model names, it was found that while most V3 genes have a V4 model match, some do not, causing some discrepancy between the sequences of old and new models. It was found that some individual V3 genes became combined while other models were omitted entirely from the V4 genome. A closer 19

examination of some of these gene models (such as that of GRMZM2G068504 in Fig. 1-

6) revealed instances of incorrectly predicted models. In some cases, RNA-Seq data could correct these models but in others the data remains inconclusive. In the case of the large 236-member EREBP family for example, 27 members (11%) had to be omitted from a multiple sequence alignment due to deficiencies in the gene models such as collapsed gene models, genome gaps, or missing models in the V4 genome. It was also found that some genes could be reclassified due to improved genome sequence such as

ZmOrphan299 being reclassified as a member of the EREBP family. Collapsed genes also pose as an issue to the completion of the TFome. For example, ZmOrphan299 and

ZmEREB120 possess atypical gene model structures and 9 predicted AP2 domains, while most members of the AP2 subtype should only have 2 (Klucher et al., 1996). These particular TFs are only three genes apart within the genome, so the possibility of tandem duplication exists. These genes could have also been disrupted by transposons. Areas of uncertainty would require further investigation due to their lack of RNA-seq data, synteny data, and EST data. The question still remains: are ZmEREB120 and

ZmOrphan299 still genes? This question may be resolved by nanopore sequencing of this region to remove ambiguities associated with compiling repetitive genome sequences.

Overall, most gene models did not undergo changes as a result of the improved reference genome. However, refined HMM models and inclusion of new HMM models for a few

TF families (RWP-RK, NF-X1, C2C2-LSD, and TIFFY) were employed by our colleagues to perform and scan the V4 genome, and this revealed almost 500 new candidate TF genes. The largest increase was in the C2H2 family from 10 to 151 predicted genes with minor changes in many other TF families. This finding underscores 20

the role of having well defined HMM models for scanning genomes and also the value of having improved reference genomes. In this study we did not yet evaluate all of the new predicted models but a preliminary review suggests that most of them are real and not redundant with the existing TFome.

The second objective of the study was to evaluate the reliability of the predicted but uncloned TF and CR gene models, assessing their experimental support by aligning

RNA-Seq reads to the models. It was found that the availability of 9 large RNA-seq datasets with tissue-specific expression data for 68 tissues from the qTELLER database greatly enhanced the ability for gene models to be assessed. In many cases it was found that many genes that previously had no RNA-seq support are now fully supported (e.g.

ZmbHLH133). In many instances, RNA-seq support indicated that the gene model was incorrectly annotated or that an alternate splice site is present. Full coverage of the gene models permitted corrections to be made and ranking of genes by read coverage allowed the prioritization of genes for future cloning. The summary of gene model support for uncloned TFs and CRs (Fig 1-5A) reveals that more than 75% are suitable for cloning by

RT-PCR. Although gene synthesis could be performed, many of the uncloned genes have long coding sequences (> 1kb) and thus synthesis is not likely to be cost effective.

Nevertheless, a majority of uncloned TF and CR gene models ranked highly, with a score of 4 or 5 and these have been tabulated (Table A-1) and primers have been designed for many of these. In addition, the RNA-Seq data were employed to discern the tissue in which a particular gene is most highly expressed (Table A-1). Thus, the main objective of this part of the study was successfully accomplished and will provide a clear course of action for the expansion of the maize TFome clone collection. The remaining models 21

with scores from 0 to 3 are currently deemed unsuitable for cloning. For example, the ambiguous gene model for GRMZM2G068504 (Fig. 1-6) currently cannot discern if one or two genes are present. These poorly supported models require further study before cloning. Models that have RNA-seq reads that do not support the annotated structure can be redrawn to reflect the read coverage. Sometimes RNA-seq reads will map to a possible splice site and will define clear exon and intron boundaries. Gene model structures can then be redrawn with the aid of homology support to fill in the regions of uncertainty.

Gene models with little or no read support may represent incorrect genomic sequence and will have to await completion of genomic gaps using nanosequencing technology and transcriptomic data from more specialized cell types. Alternatively, they may represent genes that have undergone mutations and are either pseudogenes or in the process of losing functional activity. Flawed gene models allow one to pose the question if the

TFome ever be truly complete? The major challenges to TFome completion include: poor genome quality (gaps), pseudogenes or collapsed genes, and incorrectly annotated genes.

These challenges require a genome with better sequence coverage depth and re- annotation of flawed gene models. Genes could remain undetected if they are present within a genome gap, as exemplified by the version 3 genome. Pseudogenes and collapsed genes would have to be examined on an individual basis by comparison to other genes of the same family and orthologs in related species. Incorrectly annotated genes also pose a challenge to the TFome because a functional domain could lie within a gene, yet in a different reading frame. This error would list the incorrect CDS of a gene, possible suggesting the DBD is absent. Although these obstacles exist now, there are efforts to improve annotations within the maize genome. The Gramene database aims to 22

improve the overall quality of the maize genome with the assistance of bioinformaticians

(www.gramene.org). Whole-genome alignments, synteny data, and gene annotation are being employed to improve the classification of maize genes (Tello-Ruiz et al., 2018).

More extensive RNA-seq datasets covering a greater variety of tissues and stress conditions could also help support previously unsupported genes. A transcription factor that currently ranks poorly could be expressed within a tissue that was not part of the analysis. Ultimately, improvements in sequencing technology, annotation, and more extensive RNA-seq data would help support low-ranking genes, yet there could remain many transcription factors to still be discovered.

In summary, a small but significant expansion of the predicted maize TFome was discerned with the advent of a new reference genome in maize. The employment of

RNA-Seq datasets and the V4 genome permitted the evaluation of gene models and the ranking of uncloned TF and CR genes for future cloning and long-term completion of the physical clone collection for the maize TFome. This resource has already proven useful to researchers investigating gene regulatory networks and a more complete collection will permit more accurate networks to be discovered.

23

Chapter 2

Analysis of the AP2/EREBP Transcription Factor

Family in Maize

2.1 Introduction

2.1.1 Overview of the AP2/EREBP TF Family in Maize

The AP2/EREBP (ethylene-responsive element binding protein) family of is one of the largest families of transcription factors not only in maize but also amongst other plant genomes. In version 3 of the maize genome there were a total of 212 predicted

AP2/EREBP transcription factor gene models (Burdo et al., 2014). Other plant species have been reported to possess at least 100 EREB genes. Amongst dicot plants the range is from about 98 in soybean to about 200 in Poplar (Chen et al., 2016). Proteins encoded by

AP2/EREBP family members possess at least one AP2 (APETALA2) domain, which is exclusive to the plant kingdom (Fig. 2-1). Some of the first family members with the AP2 domain were linked to the regulation of the ethylene response. The three-dimensional structure of an AP2/EREBP domain was first determined for the Arabidopsis AtERF1

(Allen et al., 1998).

24

AP2/EREBP domain proteins

ABI3/VPI AP2/EREBP RAV RAV -like

ABI3 domain AP2 domain

Figure 2-1: Relationship between Arabidopsis AP2 and ABI3 domain containing proteins. Diagram shows transcription factor families in the model plant Arabidopsis that contain the ABI3 domain, AP2 domain, or both. Circles are representative of the number of genes present within each family. Length of the boxes are proportional to the length of the domain. Arrows indicate that the gene family possesses the domain. Diagram modified from Riechmann et al., 2000.

A B

Figure 2-2: Structures of the AP2 and B3 domains found in EREBP family members. A: 3D solution structure of the AP2 DNA binding domain from the AtERF1 protein from A. thaliana (PDB:1GCC) (Allen et al., 1998). The structure shows the contact points with the target GCC box motif. B: 3D solution structure of the B3 DNA binding domain (PDB:1WID) Yamasaki et al., 2004). The predicted DNA binding region is to the right side of the schematic. Yellow and pink coloring denotes B-sheet and alpha helical structure respectively.

25

The AP2 domain is roughly 50-70 amino acids long and contains a GCC-box binding domain, which binds directly to the target gene (Fig. 2-2A) (Allen et al., 1998). The structure consists of a strand of three anti-parallel beta sheets and an (Allen et al., 1998). A B3 DNA binding domain is also found in a subfamily of EREBPs named

RAV (Related to ABI3-VIP) proteins. The structure consists of seven beta strands and two alpha helices that form a pseudobarrel protein fold (Fig. 2-2B).

2.1.2 Known Functions of AP2/EREPB Genes

Members of the AP2/EREBP family have been shown to play key roles in meristem and organ development in plants as well as in environmental stress responses. Historically, one of the first AP2/EREBP TFs was found to regulate floral organ development.

According to the ABC model of floral organ development, TFs act to determine the identity of concentric whorls of tissue. The protein encoded by the AP2 gene of

Arabidopsis was found to function in the outermost whorl of the floral meristem to determine sepals but works in the second whorl with AP2 to determine petals. AP2 is not expressed in the same whorls as AGAMOUS which functions in inner whorls to determine male and female identity respectively (Shepard and Purugganan, 2002). As a result, some EREBs are highly expressed within developing tissues, such as the embryo and ovaries. In maize, at least 13 different EREBP genes have been studied in some detail and exhibit a spectrum of associated phenotypes from sex determination to leaf shape and drought response (Table 2.1). For example, ZmEREB11, also known as TASSELSEED6

(TS6), is involved in the formation of spikelets and sex determination (Charfeddine et al.,

2015). This gene is most highly expressed in drought-stressed ovaries and the tassel primordia. AP2/EREBP transcription factors are also involved in maize stress response 26

Table 2.1: List of Maize AP2/EREBP Genes with known association

Suggested Alternate Name Name V4 Name Associated Phenotype Reference ZmEREB1 DREB1 Zm00001d000179 Temperature response (Liu et al., 2013) Stress, dehydration (Forestan et al., ZmEREB3 DBF3 Zm00001d006169 response 2018) Sex determination, spikelet ZmEREB11 TS6 Zm00001d034629 formation (Xu et al., 2017b) ZmEREB20 WRI2 Zm00001d052405 Seed oil production (Shen et al., 2010) Progression into flowering ZmEREB55 GLI15 Zm00001d046621 phase (Xu et al., 2017b) Drought response, circadian (Jonczyk et al., ZmEREB56 DBP4 Zm00001d036003 rhythm 2017) ZmEREB85 WRI1 Zm00001d005016 Seed oil production (Shen et al., 2010) Progression into flowering phase, temperature (Navarro et al., ZmEREB110 RAP2 Zm00001d010988 response 2017) ZmEREB121 PZAD03344 Zm00001d019230 Height regulation (Zhang et al., 2018) Defective ear, silkless ZmEREB151 BD1 Zm00001d022488 mutant, meristem identity (Chuck et al., 2002) (Klucher et al., ZmEREB152 HSCF1 Zm00001d046913 Temperature response 1996) Semi dwarf, irregular leaf ZmEREB171 DIL1 Zm00001d038087 shape mutant (Jiang et al., 2012) Negative regulation of plant (Brohammer et al., ZmEREB185 DREB4.1 Zm00001d002747 growth and development 2018) (Kizis and Pages, ZmEREB204 DBF1 Zm00001d032295 Drought response 2002)

by regulating ethylene. Ethylene is a vital gaseous plant hormone that induces the ripening of fruit and the shedding of leaves. It is produced in all tissues of flowering plants. Wounding or stressing a plant stimulates the release of ethylene and initiates a transcription factor response. Stressors such as drought, extreme temperatures, and high salinity are known to induce a transcription factor response (Liu et al., 2013). ZmEREB3

(DBF3), ZmEREB56 (DBP4), and ZmEREB204 (DBF1) are known to be activated during drought stress conditions (Kizis and Pages, 2002; Liu et al., 2013; Qin et al., 2003).

These genes have also been named “drought response element binding factor” (DREB) genes due to their expression during dehydration conditions (Liu et al., 2013). These 27

genes exhibit a simple one exon gene structure and one AP2 domain in the protein.

DREB genes bind to GCC-box sequences, like other AP2/EREBs, as well as drought response elements containing a ACCGAC core target sequence (Charfeddine et al., 2015;

Liu et al., 2013; Qin et al., 2003). DREB genes are highly expressed while the organism is experiencing abiotic or biotic stress. ZmEREB110 (RAP2), ZmEREB152 (HSCF1), and

ZmEREB1 (Dreb1) are responsive to extreme changes in temperature, whereas ZmEREB1 assists the plant in tolerating extreme cold (Klucher et al., 1996; Qin et al., 2003; Zhuang et al., 2010). Some EREBs have known mutant phenotypes and are members of the classical gene set. A nonsense mutation occurring in the AP2 DBD domain causes the gene’s function to be compromised resulting in plant insensitivity to ethylene. Some notable examples are: ZmEREB151 (BD1) mutants, which show defective ears and lack silks; ZmEREB171 (DIL1) mutants, which exhibit irregularly shaped wrinkled leaves; and ZmEREB55 (GL15), which has the glossy phenotype (Chuck et al., 2002; Jiang et al.,

2012; Xu et al., 2017a). ZmEREB55 is also a gene that separates Maize from its close relative, Teosinte.

2.1.3 AP2/EREBP Subfamily RAV

A subset of the AP2/EREBP family, known as RAV (Related to ABI3-VIP), contains a

B3 domain in addition to the AP2 domain (Matias-Hernandez et al., 2014). RAV (related to ABI3/VP1) subtypes are considered members of the AP2/EREBP family rather than

ABI3 because they possess an AP2 domain. There are four known AP2/EREBP-RAV genes present in the maize genome, and six in the Arabidopsis genome. (Riechmann et al., 2000) These single exon genes share the functionality of both the AP2/EREBP and

ABI3/VPI families (Riechmann et al., 2000). As described above and in Fig 2.1, the B3 28

domain forms a pseudobarrel structure. This domain regulates transcription in the presence of or abscisic acid and proteins that contain B3 DNA binding domains typically are involved in developmental processes involving these plant hormones

(Yamasaki et al., 2004).

2.1.4 The Influence of Whole Genome Duplication (WGD) on maize TF

repertoire

Several EREBs in maize have redundant functionality. This is likely a consequence of the

WGD event in maize that occurred approximately 20 million years ago shortly after divergence from sorghum (Fig 2-3).

Common Grass Ancestor Common Whole Genome ~70 MYA Duplication cWGD (~96 MYA)

10 Time Line ~25 MYA MY

mWGD Oryza Sativa Zea Mays I & II

Aegilops tauschii Triticum urartu Hordeum Sorghum bicolor vulgare Panicum Brachypodium virgatum

Figure 2-3. Whole Genome Duplication (WGD) Event in Zea mays lineage. The modern grasses are derived from a common grass ancestor that underwent a WGD event about 75 20 million years ago (MYA). The maize genome underwent a second WGD event approximately 20 MYA following divergence from a common ancestor with sorghum about 25 MYA. As a result, an estimated 66% of maize genomes are duplicated (Adapted with permission from Agarwal et al., 2016).

29

Chromosome rearrangements occurred shortly thereafter to form ten chromosomes

(n=10). The two subgenomes are referred to as maize1 and maize2, which each contained five identical chromosomes before rearrangement. A recent study indicates that the maize1 subgenome genes exhibit more gene retention and higher expression levels than the maize2 subgenome genes. In addition, many of the duplicate genes resulting from the

WGD likely gained new functions through mutations. Some genes are redundant and can exhibit varying levels of expression or slightly different coding sequence variations (Li et al., 2016). Other genes can become deleterious to the organism when there is more than one copy. When this happens, one of the copies of the gene can acquire mutations and undergo neofunctionalization or nonfunctionalization often resulting in changes in temporal or tissue specific expression. Since regulatory genes play a central role in maintaining the correct balance of target gene expression, the effects of WGD events are of particular relevance in studying gene regulatory networks (GRNs). In this study the effect of a WGD on duplicate genes within the maize EREBP TF family will be examined.

2.5 The Presence of Redundant Syntenic Genes

A direct consequence of a WGD event is synteny. Synteny naturally occurs when regions of a chromosome remain largely intact and collinear after species divergence (Liu et al.,

2018). Species that more recently diverged possess more synteny. For example, Maize is highly syntenic with Sorghum and less so with rice from which it diverged approximately

25 and 70 MYA respectively (Fig. 2-3). Within syntenic blocks, genes and their surrounding (flanking) neighbors share order and strong similarity in coding sequence and function (Fig 2.4). Recombination often occurs at a lower rate within these syntenic 30

blocks (Liu et al., 2018). Synteny can also result in an intraspecific fashion (self-synteny) as a result of a

143.97 MB 150.79 MB

6.82 of Maize Chr 6 174.03 MB

6.87 of Sorghum Chr 9 59.42 MB

44.31 MB 51.18 MB

Figure 2-4. Example of Syntenic Region detected between Maize and Sorghum using SyMAP 4.2. Schematic of syntenic block between maize chromosome 6 and sorghum chromosome 9. Brown lines indicate location of syntenic genes and the blockscore calcualted for this syntenic segment was 992.

WGD event as occurred in maize. It is estimated that approximately two thirds of the maize genes have syntelogs within the genome. Genes that are syntenic usually share the same gene structure and function. Syntenic genes are highly homologous and this can be strongly suggestive of retaining orthologous function. The corollary is not true, however, as homologous genes are not necessarily syntenic. In this study, the syntenic relationship between members of the large EREBP family was determined and examined in relation to co-expression in an attempt to detect functional redundancy.

2.1.5 Study Objectives

It was hypothesized that synteny should be detectable amongst about two thirds of the maize EREBP family members. It was further hypothesized that changes in gene expression pattern can be detected between syntelogs and that these may be used as a 31

measure of the extent to which syntelogs in this family remain functionally redundant following the WGD event in maize. To test these hypotheses, the following experimental analyses were performed:

1: A survey of the EREBP family in version 4 of the maize reference genome was conducted and new models included. Where possible ZmEREBP gene models were corrected using the data RNA-Seq mapping data from Chapter 1.

2: A multiple sequence alignment was performed and a phylogenetic tree constructed to assess relationships amongst ZmEREBP family members based on sequence homology.

3: A self-synteny analysis of the maize genome V4 was performed and EREBP syntelogs were identified.

4: Co-expression of ZmEREBP genes was conducted using RNA-Seq based expression data from 42 maize tissues. The co-expression pattern of EREBP syntelogs was analyzed to determine the extent to which duplicate genes retained or diverged in their expression patterns.

2.2 Materials and Methods

2.2.1 Synteny Data Extraction of the AP2/EREBP Family

A self synteny analysis of the maize genome version 4 was performed utilizing the

Synmap algorithm available via the CoGe platform for performing comparative genomics research (genomevolution.org). The organism parameters were set to Zea mays

(maize;corn)(id33766) against itself for the analysis. A complete table of all self-syntenic genes within maize was obtained and individual syntenic relationships could be further examined using the CoGe Synfind browser. In most instances, a syntenic region 32

contained a duplicate EREBP gene so syntenic pairs could be defined. In some instances, the syntenic region lacked an EREBP syntelog and was taken as evidence of a gene deletion. Syntelog names were corrected manually or annotated if no syntelog was present. Transcription factors were assigned to their top three syntelogs if applicable.

Syntenic genes were extracted from the Synfind output file. Self-synteny within the masked maize genome V4 was also computed by running the SyMAP V4.2 program locally in a linux environment on a Mac Pro with 8 core processors. The parameters included 100,000 minimum block size. Self-synteny block coordinates from Symap were sued in generating synteny maps. In 23 instances, the AP2/EREBP family synteny analysis revealed syntelogs that were not previously known as ZmEREBP family members. These syntelogs were examined with the CDbatch search tool on the NCBI server (ncbi.nlm.nih.gov) and genes coding for proteins that contained at least one AP2 domain became candidates for addition to the ZmAP2/EREBP family. Syntenic associations were displayed using Circos software (version 0.69-6). Synteny block coordinates were obtained from the blocks file generated from SyMap. EREBP gene coordinates for the maize genome V4 were obtained from MaizeGDB.com. The Circos script was adapted from Holmes et al., 2017. The configuration file contained ribbon coordinates for blocks of synteny, ideogram of all gene labels, and coordinates of all genes being studied. Separate gene coordinate files and ribbon links files were used to generate each circos diagram. Configuration and other files are included in appendices E,

F and G respectively.

2.2.2 Coexpression Analysis of the AP2/EREBP Family

33

The expression of the existing 212 AP2/EREBP transcription factors and 24 new candidate transcription factors were analyzed using data extracted from the qTELLER database (qteller.com). Processed RNA-Seq data from 42 of the original 66 RNA-Seq datasets were employed in the analysis. The processed files contained FPKM values for maize genome V4 gene models from these tissues. 24 tissues exhibited extremely low or no overall gene expression values for the majority of the gene models and were therefore omitted from the analysis. Coexpression analysis was performed in an R environment (V

3.4.4) using the R packages RColorBrewer, gplots, Hmisc, corrplot, and reshape2. The output file contained the correlation coefficient and p value for each AP2/EREBP transcription factor against each other. Gene pairs with a correlation coefficient above

0.70 or lower than -0.70 were considered significant.

2.2.3 Phylogenetic Analysis of the AP2 domain

AP2/EREBP transcription factor coding sequences were mined from maizegdb.org. The presence of protein domains were confirmed by a CDbatch search. For multiple sequence alignment the coding sequence of the conserved AP2 domain for each transcription factor was used as the operational taxonomic unit (OTU). 10 amino acids upstream 2 amino acids downstream of the AP2 domain were included in the OTU. Some AP2 domains had to be extended from the region predicted by CDbatch. For members of the family that contained two or more AP2 domains, the sequence of the second AP2 domain of each was used as the OTU because unlike the first AP2 domain they contained no mismatches.

Some V4 AP2/EREBP family members lacked AP2 domains that were predicted in the

V3 models and were omitted from the analysis pending model correction. AP2/EREBP models that were incorrectly annotated were also left out of the alignment. A total of 208 34

EREB models were aligned. AP2 domain sequences were aligned in MegAlignPro version 15.0.1.1 using the MUSCLE multiple sequence alignment tool. Alignments with smaller subsets of EREBP proteins were performed using the Clustal Omega algorithm within the MegAlignPro® software (DNASTAR®, Madison WI).

2.3 Results

2.3.1 Survey of EREBP Gene Structures in Maize

To initiate this study of the EREBP TF family in maize, the previous gene models were re-examined as described in chapter 1 by assessing the support for the gene structure using RNA-Seq data. A total of 212 EREBP gene models had been predicted in the maize

V3 reference genome and only one of these was not represented by a V4 gene model.

However, 12 of the new models did not appear to an entire AP2 domain often due to gaps in the genome and could not be included in subsequent phylogenetic analysis. In addition, 24 models were subsequently discovered during synteny analysis (below) including two that were reclassified from other TF families. An examination of the total

236 gene models revealed that there are three main gene structures which reflect the protein domains that they encode (Fig. 2-5). All members of the AP2/EREBP family contain an AP2 domain, but their gene models can have varying structures. The most common AP2/EREBP gene structure is a single exon gene encoding a protein with one

AP2 domain (n=165) (Fig. 2-5 A1). In some subclades of the family an intron is present either within the AP2 encoding region (n=21) or just upstream (n=16) (Fig. 2-5 A2-3).

EREBP genes encoding more than one AP2 domain have several exons (n=28) (Fig.

35

2.5B). Lastly, EREBP genes containing a single AP2 domain as well as a are single exon genes (n=5) (Fig. 2-5C).

A1 ZmEREB17 (n=165) AP2 domain B3 or ZmEREB207 (n= 21) A2 ABI3 domain

A3 ZmEREB111 (n = 16)

B ZmEREB77 (n = 28)

C ZmEREB49 (AP2/EREBP-RAV ) (n= 5)

Figure 2-5: Summary of AP2/EREBP family Gene Structures in Maize. At least 5 alternate gene structures are found within the AP2/EREBP gene family in maize. Representative gene structures are summarized in the scaled (300:1) schematics and named Type A through Type C as follows. A1: Single exon gene with one AP2 domain, A2: Two exon gene with intron dividing single AP2 domain coding region, A3: Example of two exon gene with intron upstream of AP2 domain coding region, B: Multiple exon gene with introns within AP2 domain coding regions, C: AP2/EREBP-RAV gene with AP2 domain and B3 domain.

2.3.2 Phylogenetic Analysis of the EREBP family in maize

A total of 208 ZmEREBP gene models exhibited an identifiable AP2 domain and thus could be included in a multiple sequence alignment. The AP2 domain and 10 residues upstream and downstream were chosen as the operational taxonomic unit (OTU) and aligned using the MUSCLE algorithm. The multiple sequence alignment (MSA) was used to construct a phylogenetic tree (Fig. 2-6). The phylogenetic tree reveals three major clades and one subclade. The EREB clade which is also termed the ERF clade is composed of at least 90 EREBP family members that exhibit a single AP2 domain and an

36

amino acid motif AAEIRD that has been associated with binding to a conserved GCC box (AGCCGCC) target sequence (Rihan et al., 2017). The DREB clade is composed of

Zm00001d044950

Zm00001d018288

ZmEREB136 ZmEREB13 DREB CLADE ZmEREB27 ZmEREB199 Zm00001d025477 ZmEREB63 ZmEREB84

ZmEREB57 ZmEREB157 ZmEREB32 Binds DRE and GCC box ZmEREB106 ZmEREB68 ZmEREB127ZmEREB10 ZmEREB186

ZmEREB144ZmEREB39 ZmEREB173 ZmEREB4 ZmEREB156 ZmEREB47 ZmEREB122 Single AP2 domainZmEREB191 ZmEREB25 Zm00001d046501 ZmEREB145 RAV SUBCLADE ZmEREB46 ZmEREB207 ZmEREB5 ZmEREB72 ZmEREB66 ZmEREB15 ZmEREB185 ZmEREB190 ZmEREB201 ZmEREB62 Single AP2 domain ZmEREB1 ZmEREB82 ZmEREB44 ZmEREB168 ZmEREB142 ZmEREB30 ZmEREB88 ZmEREB104 Zm00001d011639 ZmEREB176 ZmEREB126 Single B3 domain Zm00001d026182 ZmEREB162 Zm00001d026184ZmEREB6 ZmEREB49 Zm00001d039907 Zm00001d010987 ZmEREB188 ZmEREB11 ZmEREB212 ZmEREB141 ZmEREB197 AP2 CLADE ZmEREB56 ZmEREB81 ZmEREB55 ZmEREB16 ZmEREB105 ZmEREB206 Binds GCC box only ZmEREB152 ZmEREB3 ZmEREB53 ZmEREB36 ZmEREB130 ZmEREB24 ZmEREB79 ZmEREB114 Multiple AP2 ZmEREB65 ZmEREB41 ZmEREB23 ZmEREB161 ZmEREB115 ZmEREB26 Zm00001d052732 ZmEREB143domains ZmEREB117 ZmEREB12 ZmEREB33 ZmEREB34 ZmEREB60 ZmEREB86 ZmEREB20 ZmEREB154 ZmEREB77 ZmEREB76 ZmEREB184 ZmEREB170 ZmEREB163 ZmEREB203 ZmEREB31 ZmEREB169 ZmEREB17 ZmEREB108 ZmEREB18 ZmEREB150 ZmEREB71 Zm00001d029940 Zm00001d051340 ZmEREB101 0.05 ZmEREB112 ZmEREB107 ZmEREB87 ZmEREB109 ZmEREB61 ZmEREB204 ZmEREB198 ZmEREB100 ZmEREB2 ZmEREB89 ZmEREB195 ZmEREB137 ZmEREB165 ZmEREB22 ZmEREB146 ZmEREB134 Zm00001d008665 Zm00001d027686 ZmEREB164 ZmEREB74 ZmEREB48 Zm00001d005309 ZmEREB38 ZmEREB189 ZmEREB78 ZmEREB124 ZmEREB177 ZmEREB59 Zm00001d043409ZmEREB51 ZmEREB166 ZmEREB91 ZmEREB153 ZmEREB128 ZmEREB125 ZmEREB58 ZmEREB178 ZmEREB211 ZmEREB151 ZmEREB172 ZmEREB183 ZmEREB139 ZmEREB7 ZmEREB132 ZmEREB174 ZmEREB194 ZmEREB111 ZmEREB42 ZmEREB28 ZmEREB93 ZmEREB133 ZmEREB113 ZmEREB29 ZmEREB21 ZmEREB129 ZmEREB158 ZmEREB119 ZmEREB83 ZmEREB116 ZmEREB210 ZmEREB147 ZmEREB148 ZmEREB92 ZmEREB193 ZmEREB200 ZmEREB202 ZmEREB196 ZmEREB67 ZmEREB98 ZmEREB160 ZmEREB14 ZmEREB19 ZmEREB102 ZmEREB167 ZmEREB69 ZmEREB180 ZmEREB182 ZmEREB181 ZmEREB179 Zm00001d023662 ZmEREB97 ZmEREB175

ZmEREB205ZmEREB54 Zm00001d014565 ZmEREB8 ZmEREB209

ZmEREB138ZmEREB40 ZmEREB52 EREB (ERF) CLADE ZmEREB155 ZmEREB159 ZmEREB50 ZmEREB96 ZmEREB80

ZmEREB35

Zm00001d001907 ZmEREB208 ZmEREB135 ZmEREB120 GCC box AGCCGCC Zm00001d005203 Binds GCC box but not DRE Single AP2 domain DRE box A/GCCGAC Figure 2-6: Phylogenetic Tree of 208 EREBP family members in Maize. The AP2 domain region of 208 ZmEREBP proteins was used as the operational taxonomic unit and a multiple sequence alignment (MSA)0.05 performed using the MUSCLE algorithm using default parameters. The MSA was used to construct a phylogenetic tree and displayed as an unrooted tree using Figtree (V1.4.3). Four distinct clades/subclades are defined and the structural domains that are within each conserved clade are highlighted EREB(ERF) – green, DREB – pink, AP2 – orange, RAV –red.

86 EREBP family members with a single AP2 domain but with a different VAEIRE/L amino acid motif at the corresponding position. This other motif has been associated with permitting the binding to not only the GCC box but also the drought responsive element

(DRE) which has a A/GCCGAC consensus motif (Rihan et al., 2017). The third major

37

clade is comprised of 27 AP2 EREBP family members which exhibit two or more AP2 domains and multi-exon gene structures. The RAV subclade is composed of 5 EREBP members that exhibit a single exon gene structure that encodes both an N-terminal AP2 domain and a B3 domain. Tandem duplications, such as ZmEREB179, ZmEREB180,

ZmEREB181, and ZmEREB182 appear next to each other on the tree because their sequences are nearly identical. Syntelogs, such as ZmEREB151 and ZmEREB183 exhibit very short branch length on the phylogenetic tree because they are highly homologous

(but are not syntelogs as shown later).

2.3.3 Detection of Synteny amongst Maize EREBP family members

The occurrence of a WGD event in the maize lineage means that many genes are duplicate and the detection of synteny permits likely orthologs to be discerned

(Brohammer et al., 2018; Schnable and Freeling, 2011). Synteny analysis is normally applied across species but when a WGD event has occurred it is most appropriate to perform a self-synteny analysis to detect likely orthologs. The maize genome V4 was examined for self synteny using the SynFind algorithm (Materials and Methods) which reported up to three syntelogs. Of a total of 149669 features (including all gene variants) analyzed, 34,007 exhibited some level of synteny in the genome (Table 2.2). 77.28% of these exhibited a syntenic depth of 1 meaning that synteny was detected with one other locus in the genome. 19.7% and 2.4% exhibited syntenic depths of 2 or 3 but only 0.59% exhibited higher levels of synteny. We examined the level of synteny amongst all TF and

CR models across all chromosomes and found that 1343 (51%) of TFs and 125 (40%) of

CRs were syntenic within the maize genome (Table 2.2). Within the EREBP family 129

(54.65%) of 236 gene models exhibited synteny with one or more loci in the genome. Of 38

these 65.8%, 22.5% and 12.5% of those exhibiting syntenic depths of 1, 2 and 3 respectively (Table A3).

Table 2.2: Summary of Self-Synteny in Maize Genome V4 No. of Syntenic Syntenic Depth as TF and CR # Syntenic # Syntenic Synteny Depth Features % of Total Location TFs CRs

Depth 1 29509 86.77% chr 1 198 21 Depth 2 3620 10.64% chr 2 168 11 Depth 3 748 2.20% chr 3 128 13 Depth 4 83 0.24% chr 4 132 16 Depth 5 18 0.05% chr 5 158 20 Depth 6 9 0.03% chr 6 94 7 Depth 7 4 0.01% chr 7 101 4 Depth 8 10 0.03% chr 8 156 14 Depth 9 6 0.02% chr 9 110 10 chr 10 98 9

Total Syn. Features 34007 1343 125

Of the genes that did exhibit synteny, it was possible to identify 46 pairs of EREB genes that exhibited highest synteny with one another (reciprocal synteny) (Table 2.3). A further 12 and 2 pairs could be identified to be syntenic with one another although they exhibited higher scores with 1 or 2 other regions in the genome (Table 2.3). 21 of the known EREBP genes and 12 of the newly discovered EREBP genes were found to exhibit synteny with a genomic location that did not harbor another identifiable EREBP gene (Table A.4). In these instances, there is significant synteny of the flanking genomic regions and it is likely that these represent instances where the syntelog has undergone deletion or mobilization following the WGD event.

39

Table 2.3 List of Syntenic EREB Gene Pairs in Maize

Syntelog Reciprocal TF Name V4 Name Ranking Name Syntelog V4 Name Ranking Synteny1 ZmEREB1 Zm00001d000179 3 ZmEREB44 Zm00001d029884 4 R1

ZmEREB2 Zm00001d026191 4 ZmEREB195 Zm00001d017462 5 R1

ZmEREB3 Zm00001d006169 4 ZmEREB36 Zm00001d021205 3 R1 ZmEREB4 Zm00001d028524 4 ZmEREB173 Zm00001d047860 4 R1 ZmEREB5 Zm00001d051350 3 ZmEREB66 Zm00001d017478 2 R1 ZmEREB6 Zm00001d002744 3 ZmEREB104 Zm00001d017480 4 R1 ZmEREB7 Zm00001d052167 3 ZmEREB139 Zm00001d018158 3 R1 ZmEREB8 Zm00001d053707 4 ZmEREB138 Zm00001d015639 5 R1 ZmEREB10 Zm00001d017591 3 ZmEREB127 Zm00001d051451 3 R1 ZmEREB12 Zm00001d009622 4 ZmEREB34 Zm00001d039077 3 R1 ZmEREB13 Zm00001d052152 4 ZmEREB136 Zm00001d018191 3 R1 ZmEREB17 Zm00001d052229 4 ZmEREB18 Zm00001d018081 3 R1 ZmEREB19 Zm00001d036536 3 not yet named Zm00001d014565 5 R1 ZmEREB23 Zm00001d021207 4 ZmEREB65 Zm00001d006170 4 R1 ZmEREB27 Zm00001d037165 4 not yet named Zm00001d044950 3 R1 ZmEREB29 Zm00001d012584 4 ZmEREB158 Zm00001d043204 4 R1 ZmEREB30 Zm00001d017477 4 ZmEREB176 Zm00001d051355 4 R1 ZmEREB32 Zm00001d050787 1 ZmEREB157 Zm00001d016848 0 R1 ZmEREB38 Zm00001d027870 3 ZmEREB124 Zm00001d048297 4 R1 ZmEREB41 Zm00001d007840 5 not yet named Zm00001d013170 4 R1 ZmEREB47 Zm00001d002079 4 ZmEREB156 Zm00001d026447 4 R1 ZmEREB54 Zm00001d024324 5 ZmEREB209 Zm00001d049364 4 R1 ZmEREB57 Zm00001d052102 3 not yet named Zm00001d018288 2 R1 ZmEREB62 Zm00001d006653 3 ZmEREB201 Zm00001d021892 3 R1 ZmEREB64 Zm00001d019216 3 not yet named Zm00001d005203 3 R1 ZmEREB65 Zm00001d006170 4 ZmEREB115 Zm00001d021208 4 R1 ZmEREB71 Zm00001d048208 4 ZmEREB101 Zm00001d028017 5 R1 ZmEREB76 Zm00001d017366 4 ZmEREB170 Zm00001d051239 3 R1 ZmEREB80 Zm00001d011499 3 ZmEREB208 Zm00001d044004 4 R1 ZmEREB81 Zm00001d035512 4 not yet named Zm00001d010987 4 R1 ZmEREB83 Zm00001d012585 5 ZmEREB147 Zm00001d043205 5 R1 ZmEREB91 Zm00001d003871 3 ZmEREB125 Zm00001d025298 2 R1 ZmEREB102 Zm00001d020595 5 ZmEREB202 Zm00001d005798 5 R1 ZmEREB103 Zm00001d039424 2 ZmEREB186 Zm00001d008872 1 R1 ZmEREB104 Zm00001d017480 4 ZmEREB176 Zm00001d051355 4 R1 ZmEREB113 Zm00001d010175 3 ZmEREB133 Zm00001d038320 4 R1 ZmEREB118 Zm00001d008545 4 ZmEREB192 Zm00001d039989 5 R1 ZmEREB126 Zm00001d043782 4 ZmEREB162 Zm00001d038907 1 R1 40

ZmEREB145 Zm00001d036889 4 not yet named Zm00001d046501 5 R1 ZmEREB146 Zm00001d009573 2 ZmEREB165 Zm00001d039019 2 R1 ZmEREB151 Zm00001d022488 4 ZmEREB183 Zm00001d007119 2 R1 ZmEREB162 Zm00001d038907 1 not yet named Zm00001d011639 NA R1 ZmEREB172 Zm00001d031796 3 ZmEREB211 Zm00001d005892 4 R1 ZmEREB184 Zm00001d034204 5 not yet named Zm00001d013170 4 R1 ZmEREB197 Zm00001d002075 5 ZmEREB212 Zm00001d026448 4 R1 ZmEREB61 Zm00001d002760 1 ZmEREB107 Zm00001d017466 3 R1 not yet named Zm00001d051340 1 not yet named Zm00001d029940 5 R2 not yet named Zm00001d026184 3 ZmEREB176 Zm00001d051355 4 R2 ZmEREB49 Zm00001d009468 3 ZmEREB162 Zm00001d038907 1 R2 ZmEREB86 Zm00001d025910 3 ZmEREB154 Zm00001d002867 4 R2 ZmEREB92 Zm00001d000339 4 ZmEREB158 Zm00001d043204 4 R2 ZmEREB93 Zm00001d010676 4 ZmEREB42 Zm00001d038584 4 R2 ZmEREB135 Zm00001d036251 1 ZmEREB175 Zm00001d045262 1 R2 ZmEREB136 Zm00001d018191 3 ZmEREB27 Zm00001d037165 4 R2 ZmEREB144 Zm00001d016260 2 ZmEREB39 Zm00001d046292 3 R2 ZmEREB154 Zm00001d002867 4 ZmEREB170 Zm00001d051239 3 R2 ZmEREB174 Zm00001d042717 1 ZmEREB133 Zm00001d038320 4 R2 ZmEREB204 Zm00001d032295 5 ZmEREB109 Zm00001d020267 4 R2 not yet named Zm00001d026182 4 ZmEREB176 Zm00001d051355 4 R3 ZmEREB31 Zm00001d044857 5 ZmEREB17 Zm00001d052229 4 R3

1 R1 represents that each member of pair exhibited reciprocal synteny in search R2 represents that one member of pair exhibited higher synteny with 1 other gene R3 represents that one member of pair exhibited higher synteny with 2 other genes V4 name indicates the gene name within Maize Genome 4 Ranking indicates the gene model cloning ranking based on RNA-seq support

In summary, it was possible to discern that about half of the EREBP genes were duplicate in the genome with 60 high quality syntelog pairs (Table 2.3). This represents an opportunity to explore the consequences of gene duplication on the evolution of this family in particular in regards to conservation or divergence of expression patterns.

2.3.4 Coexpression Analysis of Maize EREBP family members

The consequences of gene duplication on expression of the EREBP gene family was explored by examining the coexpression patterns of both syntenic and non-syntenic

EREBP family members in maize. The transcriptomic expression data (as FPKM values)

41

from 48 different tissues was downloaded and coexpression analysis was performed on all EREBP family members (both syntenic and non-syntenic) using the Hmisc package in an R environment (Materials and Methods). It was found that many syntenic pairs were highly coexpressed with at least 21 pairs exhibiting correlation values > 0.77 (Table 2.4 and Fig 2.7).

Table 2.4: List of syntenic EREB family members that are highly coexpressed

Highest Coexpressed Highest TF Name FPKM1 Expressed Tissue Corr Syntelog FPKM Expressed Tissue 21 old leaves 21 old leaves ZmEREB17 102.7 seedling field 0.96 ZmEREB18 129.4 seedling field

Leaf Meristems Ovaries Drought ZmEREB2 11.2 Drought stressed 0.93 ZmEREB198 14.3 stressed ZmEREB97 41.2 Embryo 14DAP 0.91 ZmEREB209 76.5 Embryo 14DAP ZmEREB205 94.8 Embryo 14DAP 0.9 ZmEREB97 41.2 Embryo 14DAP Ovaries Drought ZmEREB83 28.1 Embryo 14DAP 0.89 ZmEREB147 53.1 stressed ZmEREB8 7.2 Embryo 14DAP 0.87 ZmEREB138 16.8 Whole anthers 20 old leaves ZmEREB31 7.2 Leaf Meristems 0.86 ZmEREB17 102.7 seedling field ZmEREB64 47.8 Embryo 14DAP 0.86 Zm00001d005203 15.3 Embryo 14DAP

ZmEREB118 76.1 Embryo 14DAP 0.86 ZmEREB192 77.8 Endosperm 14DAP pre pollination ZmEREB19 7.6 tassel 0.86 Zm00001d014565 19 post pollination Endosperm Ovaries Drought ZmEREB47 22.3 14DAP 0.85 ZmEREB156 19.8 stressed Leaves 20 days old ZmEREB76 2 Leaf Meristems 0.85 ZmEREB170 0.9 seedling gc ZmEREB145 4.9 Ovaries 1 DAP 0.85 Zm00001d046501 11.1 Ear DAP 10 ZmEREB208 8.7 Whole anthers 0.81 ZmEREB80 7.1 Whole anthers ZmEREB62 0.8 Ovule B73 0.78 ZmEREB201 2 Ovule B73 ZmEREB110 Leaves 20 days old * 20.5 Embryo 14DAP 0.78 ZmEREB81 95.2 seedling gc Leaf Meristems ZmEREB136 5.5 Seedling shoots 0.77 ZmEREB13 9.3 Drought stressed ZmEREB173 28 B73 pollen 0.77 ZmEREB4 4.6 B73 pollen Zm00001d0 11639 * 6.3 Mature silk 0.79 ZmEREB126 10.4 Endosperm 14DAP * = uncloned in maize TFome 1.0 1 FPKM = Fragments Per Kilobase Million in tissue in which highest expression of this gene was observed 2 Highest Expressed Tissue = maize tissue for which FPKM value is greatest amongst 66 RNA-Seq datasets used in this study. DAP = days after pollination 42

Figure 2-7: Schematic of syntenic ZmEREBP family members that are highly coexpressed. Schematic shows chromosomal location of syntenic ZmEREBP genes listed in Table 2.4. The colored ribbons show the syntenic block regions as determined by self- synteny of the maize genome V4 using SyMap4.2.

It was determined that not only are these syntenic pairs highly coexpressed across 48 tissue samples but also that most often the tissues in which each member of the pair exhibit highest expression is the same e.g. both ZmEREB17 and 18 are highly expressed 43

in 21 old seedling leaves (Table 2.4). The high coexpression of syntelogs is suggestive of a conservation of gene function following genome duplication and suggest that these gene pairs have overlapping and redundant function in plants. Equally interesting is the finding that syntenic EREBP pairs may also be not co-expressed (Table 2.5).

Table 2.5: List of syntenic EREB family members that are not coexpressed

Highest Highest Expressed Coexpressed Expressed EREBP (A) FPKM1 Tissue 2 Corr EREBP (B) FPKM Tissue pre emergence ZmEREB61 2.9 cob 0.26 ZmEREB107 5.7 Embryo 14DAP

ZmEREB32 0.5 Developing Leaf 0.29 ZmEREB157 0.4 Whole anthers ZmEREB38 9.2 mature silk 0.3 ZmEREB48 12.5 B73 pollen Bundle sheath ZmEREB57 27.1 cells 0.36 Zm00001d018288 3.2 Seedling shoots

ZmEREB151 1.7 Ear primordia 1 0.36 ZmEREB183 0.4 Mature leaf ZmEREB63 17.3 Seedling shoots 0.38 ZmEREB84 5.2 Seedling shoots Ovaries Drought ZmEREB113 8.1 mature silk 0.39 ZmEREB133 6.9 stressed

ZmEREB91 10.8 B73 pollen 0.39 ZmEREB125 1.7 mature silk

ZmEREB103 0.7 Developing Leaf 0.4 ZmEREB186 1.7 Embryo 14DAP

ZmEREB86 0.4 mature silk 0.47 ZmEREB154 3.6 Embryo 14DAP

ZmEREB39 1.0 mature silk 0.49 ZmEREB144 0.9 Ovule B73 Leaf Meristems Drought Leaf Meristems ZmEREB23 2.3 stressed 0.51 ZmEREB65 8.7 Drought stressed

ZmEREB41 23.0 Shoot apex 1 0.51 Zm00001d013170 4.8 Embryo 14DAP

ZmEREB30 2.8 DAP 10 0.52 Zm00001d026184 0.9 Ovaries 1 DAP ZmEREB12 1.1 5 DAP Seed 0.55 ZmEREB34 2.1 mature silk

ZmEREB6 2.5 Embryo 14DAP 0.59 ZmEREB104 10.0 Developing Leaf Leaf Meristems Drought Leaf Meristems ZmEREB10 11.5 stressed 0.6 ZmEREB127 11.6 Drought stressed Leaf Meristems Drought ZmEREB172 40.1 stressed 0.61 ZmEREB211 19.3 Seedling shoots 44

Table 2.5 contd: List of syntenic EREB family members that are not coexpressed

Highest Highest Expressed Coexpressed Expressed EREBP (A) FPKM Tissue Corr EREBP (B) FPKM Tissue Leaves 20 days ZmEREB146 1.0 old seedling gc 0.61 ZmEREB165 1.1 Embryo 14DAP Ovaries Drought ZmEREB131 4.9 stressed 0.63 ZmEREB149 3.3 Embryo 14DAP ZmEREB5 0.9 Embryo 14DAP 0.66 ZmEREB66 1.2 Embryo 14DAP

ZmEREB197 11.3 Ovaries 1 DAP 0.68 ZmEREB212 5.7 Endosperm 14DAP Leaf Meristems Drought ZmEREB71 14.6 stressed 0.69 ZmEREB101 7.1 DAP 10

ZmEREB7 7.8 Seedling shoots 0.69 ZmEREB139 14.4 Seedling shoots 1 FPKM = Fragments Per Kilobase Million in tissue in which highest expression of this gene was observed 2 Highest Expressed Tissue = maize tissue for which FPKM value is greatest amongst 66 RNA-Seq datasets used in this study. DAP = days after pollination

For example, ZmEREB61 and ZmEREB107 have a coexpression correlation value of only

0.29 suggesting that this gene pair has diverged in temporal or spatial expression since the WGD event in maize. In some instances, the expression of the syntelog is greatly reduced suggestive of a degraded loss of function. Lastly, it was found that genes need not be syntenic in order to be highly coexpressed (Table 2.6). It was found that some non- syntenic genes are highly coexpressed, for example ZmEREB97 and ZmEREB209 are more highly co-expressed (Corr value = 0.91) than with their syntelogs. Such genes may represent orthologs that have been mobilized in the genome via transposition or that have lost collinearity due to recombination.

45

Table 2.6: List of non-syntenic EREB family members that are highly coexpressed

Highest Highest Expressed Coexpressed Expressed EREBP (A) FPKM1 Tissue 2 Corr EREBP (B) FPKM Tissue ZmEREB97 41.2 Embryo 14DAP 0.91 ZmEREB209 76.5 Embryo 14DAP ZmEREB96 4.6 post pollination 0.91 Zm00001d014565 19.0 post pollination pre emergence ZmEREB109 187.7 Embryo 14DAP 0.91 ZmEREB159 44.2 cob pre emergence ZmEREB52 31.8 Embryo 14DAP 0.93 ZmEREB159 44.2 cob Ovaries Drought Ovaries Drought ZmEREB67 12.2 stressed 0.9 ZmEREB202 96.5 stressed Leaf Meristems ZmEREB54 79.1 Embryo 14DAP 0.9 ZmEREB182 74.6 Drought stressed Ovaries Drought ZmEREB67 12.2 stressed 0.9 ZmEREB197 11.3 Ovaries 1 DAP Endosperm Leaf Meristems ZmEREB126 10.4 14DAP 0.9 ZmEREB174 3.6 Drought stressed Ovaries Drought ZmEREB83 28.1 Embryo 14DAP 0.89 ZmEREB147 53.1 stressed ZmEREB45 14.7 Whole anthers 0.89 ZmEREB96 4.6 post pollination Ovaries Drought ZmEREB54 79.1 Embryo 14DAP 0.89 ZmEREB196 25.5 stressed Ovaries Drought ZmEREB196 25.5 stressed 0.89 ZmEREB204 66.8 Embryo 14DAP Leaves 20 days ZmEREB40 34.9 old seedling gc 0.89 ZmEREB54 79.1 Embryo 14DAP ZmEREB96 4.6 post pollination 0.89 ZmEREB138 16.8 Whole anthers Leaf Meristems Drought Leaf Meristems ZmEREB181 12.0 stressed 0.89 ZmEREB182 74.6 Drought stressed Leaf Meristems Drought ZmEREB182 74.6 stressed 0.89 ZmEREB209 76.5 Embryo 14DAP ZmEREB52 31.8 Embryo 14DAP 0.88 ZmEREB109 187.7 Embryo 14DAP Ovaries Drought Ovaries Drought ZmEREB67 12.2 stressed 0.88 ZmEREB147 53.1 stressed ZmEREB109 187.7 Embryo 14DAP 0.87 ZmEREB121 17.6 Embryo 14DAP Leaf Meristems Drought Leaf Meristems ZmEREB56 11.7 stressed 0.87 ZmEREB105 20.1 Drought stressed ZmEREB45 14.7 Whole anthers 0.87 Zm00001d014565 19.0 post pollination Leaf Meristems ZmEREB54 79.1 Embryo 14DAP 0.87 ZmEREB181 12.0 Drought stressed 21 days old leaves seedling ZmEREB17 102.7 field 0.86 ZmEREB31 7.2 Leaf Meristems ZmEREB139 14.4 Seedling shoots 0.86 ZmEREB191 1.8 5 DAP Seed ZmEREB54 79.1 Embryo 14DAP 0.86 ZmEREB97 41.2 Embryo 14DAP 46

Table 2.6 contd: List of non-syntenic EREB family members that are highly coexpressed

Highest Highest Expressed Coexpressed Expressed EREBP (A) FPKM Tissue Corr EREBP (B) FPKM Tissue Ovaries Drought ZmEREB67 12.2 stressed 0.86 ZmEREB160 98.5 5 DAP Seed Leaves 20 days ZmEREB40 34.9 old seedling gc 0.86 ZmEREB209 76.5 Embryo 14DAP ZmEREB121 17.6 Embryo 14DAP 0.86 ZmEREB160 98.5 5 DAP Seed 1 FPKM = Fragments Per Kilobase Million in tissue in which highest expression of this gene was observed 2 Highest Expressed Tissue = maize tissue for which FPKM value is greatest amongst 66 RNA-Seq datasets used in this study. DAP = days after pollination

2.4 Discussion and Conclusions

The main goal of this study was to examine the effect of a whole genome duplication

(WGD) event on the repertoire and function of a family of TFs in maize. The ZmEREBP family was chosen because it not only is a large plant-specific TF family with important roles in growth and development, but also because it is a large family and should yield more duplicate genes for comparison. The V4 reference genome was scanned for new

EREBP family members. Of the 212 ZmEREBP models detected in the V3 reference genome, only one was not found in the V4 annotation. However, our colleagues detected

13 more EREBP models by scanning the genome for genes encoding proteins with AP2 motifs. These 13 new models were also found independently in our lab. By conducting self- synteny analysis plus an additional 11 models for a total of 24 new EREBP models

(an 11% increase). A total of 236 models were predicted to be ZmEREBP members and transcriptomic data was used to refine these models as outlined in Chapter 1. Self-synteny is therefore useful in finding some gene models that may have eluded genome scanning and ab initio prediction or poor annotation. An example of this was ZmEREB162 which

47

was found to be syntenic with Zm00001d011639. The latter gene had previously been classified as an ABI3 type TF, but a BLAST search and sequence alignment indicated that it ought to be reclassified as an AP2/EREBP-RAV due to the presence of an AP2 domain and a B3 domain. This particular gene possessed an AP2 domain, whereas ABI3 genes only have B3 domains (Kagaya et al., 1999; Yamasaki et al., 2004). None of the other Maize ABI3 genes had an AP2 domain. There is a close relationship between

AP2/EREBP and ABI3/VPI genes (Fig. 2-1), but the presence of an AP2 domain always classifies a gene as an AP2/EREBP, regardless if another domain is present within the sequence. The addition of new members to the EREBP family due to an improved genome is likely incomplete as some models remain very uncertain. For example, the

ZmOrphan299 was not included in the MaizeGDB V3 toV4 conversion list but a

GenBank search revealed a match to the Zm00001d028069 gene model which was found to encode multiple AP2 domains. Furthermore, it mapped nearby to ZmEREB120

(Zm00001d028066) on chr1. Both genes had poor read support (score = 1) and exhibited at least 9 AP2 domains. The structure of these genes did not resemble any other members of the AP2/EREBP family. It is likely that this region of the genome harbors several tandem repeats making it difficult to sequence and map reads. The current gene models for such genes likely represent “collapsed” gene models that await resolution by higher resolution sequencing. In conclusion, there are at least 236 ZmEREBP family members but possibly more exist. However, gaps in the reference genome as well as likely incorrect models such as those described above, permitted the inclusion of just 208 in a phylogenetic analysis of the family based on using the conserved AP2 domain as OTU for alignment. 48

The phylogenetic analysis of the AP2/EREBP family divided genes into clades based on subtype and their binding capabilities. The 4 main clades included the EREB(ERF),

DREB, AP2, and RAV groups. The largest groups are the EREB and DREB clades that differ slightly in their target affinities but this reflects significant divergence in the target gene sets that they regulate. DREB TFs play a significant role in regulating genes involved in cold acclimation in plants. Although not conducted here it would be of interest to compare the repertoire of these subclades across species that differ in their cold acclimation abilities.

Once the EREBP repertoire was established, a self-synteny analysis of the maize genome

V4 was performed so that EREBP syntelogs could be identified. About 51% of the

EREBP family exhibited synteny and of these about 60 syntenic pairs could be identified of which 46 pairs exhibited high syntenic relationships in a reciprocal manner in the synteny search. These genes could represent orthologous genes within the maize genome and possibly have overlapping or redundant functions. Other EREBP genes for which synteny with a part of the genome that does not harbor a homologous gene (Table A-4) could represent instances where the duplicate gene has either been lost since the WGD event or has been mobilized to another part of the genome. Alternatively, gaps in the genome may not currently permit the syntelog to be discerned and future genome revisions may reveal homologs in these syntenic regions.

The relationship between syntenic gene pairs was further explored by examining the co- expression of EREBP family members. The 21 syntenic EREBP gene pairs that exhibited high coexpression values (>0.75) likely represent duplicate genes in the genome that have retained most of their functional activity since the WGD event. ZmEREB110 (RAP2) has 49

been studied and linked to the progression of flowering stage in maize and the temperature response (Jamann et al., 2017). It was found that this is a syntelog of

ZmEREB81 but the role of the latter has not been investigated. In general, the highly syntenic EREBP gene pairs were often found to be expressed at their highest levels in the same tissues and this is relevant to researchers who are studying one of the members of these pairs. Caution should be taken however as the ZmEREB81 gene mentioned above is expressed at its highest level in seedling tissue (FPKM = 95) but the ZmEREB110 syntelog is expressed at its highest level (FPKM= 20) in embryos 14 days after pollination. Another example of a highly coexpressed EREBP pair is ZmEREB17 and 18

(corr =.96) the former which has been linked to C4 photosynthesis regulation (Perduns et al., 2015). The high level of co-expression of these genes as well as the fact that they are both maximally expressed in seedling leaves strongly suggest a redundant function. The presence of such likely redundant syntelogs would make single gene knock out studies more difficult, as one of the genes could replace the other functionally. There are 61 transcription factors within the classical maize gene list. 26 of them exhibit one, two, or three syntelogs. Interestingly, 9 of these 26 syntelogs are reciprocal with another classical

TF. It is unsurprising that the majority of the classical TFs lack a syntelog as they typically have visible phenotypes (i.e. not masked by a duplicate gene function). A mutant phenotype could possibly require mutations within a gene and its syntelog.

However, since syntenic genes are also highly homologous, it may be possible to engineer double knockouts using new gene editing technology such as CRISPR. A similar consideration may be applied to tandem duplications with high coexpression values such as ZmEREB181 (Zm00001d027928) and ZmEREB182 (Zm00001d027929) 50

(corr = 0.89). These duplicate pairs likely arose after the WGD event and are also exhibit highly homologous sequences.

It is also of interest to document syntenic pairs that differ in their expression patterns. For example, the syntenic pair ZmEREB32 and ZmEREB157 are not co-expressed (corr =

0.29). ZmEREB32 is expressed maximally in developing leaves and the leaf meristem, whereas ZmEREB157 is expressed maximally in anthers and pollen grains. This divergence in gene expression profile of duplicate genes reflects a divergence in gene function and regulation since the maize WGD event. Therefore, these gene pairs are unlikely to participate in the same GRNs.

Several non-syntenic EREBs were found to be highly coexpressed, for example

ZmEREB52 (Zm00001d042588) and ZmEREB159 (Zm00001d034605) (corr = 0.93).

Both show overlapping expression in the pre-emergence cob and other female tissues.

These genes may reflect duplicate genes that lost a collinear relationship due to chromosomal recombination or rearrangements, or mobilization via transposition.

Alternatively, they have converged to have a similar expression pattern but this is infrequent form an evolutionary point of view.

Self-synteny analysis can serve as a powerful tool for discovering new TFs and CRs. The

AP2/EREBP family was chosen due to its large size and this increased likelihood for the discovery of syntelogs. If these synteny analysis methods were employed to the entire

TFome, every family would most likely be expanded since 10 of the syntelogs discovered in this family alone were not detected on the V4 HMM genome scan. Genes within a syntenic block without a duplicate suggest that the gene was deleted and is no longer functional. The WGD event created two copies of every gene 25 MYA, yet the effects 51

were not simply additive. Self-synteny analysis thus provides clues about which genes no longer exist. Coexpression analysis can also help predict the copy number requirement of these duplicate genes. For example, the function of the unpaired gene within a syntenic block could have been deleterious to maize if present in multiple copies. When duplicate

TF genes are retained and expressed at similar levels it suggests that the copy number is less critical for overall plant growth and development.

In conclusion, it was found that about half of the ZmEREBP family are duplicate genes but within them there are examples of conserved and divergent expression patterns. In addition, there are non-syntenic EREBP genes which may have lost their syntelog or it lies elsewhere in the genome due to chromosomal rearrangements. It is expected that this outcome will be shared in other TF and CR families and is a useful consideration when studying mutant phenotypes or creating knockouts for functional analysis.

52

References

Allen, M.D., Yamasaki, K., Ohme-Takagi, M., Tateno, M., and Suzuki, M. (1998). A novel mode of DNA recognition by a beta-sheet revealed by the solution structure of the

GCC-box binding domain in complex with DNA. EMBO (European Molecular Biology

Organization) Journal 17, 5484-5496.

Bolduc, N., Yilmaz, A., Mejia-Guerra, M.K., Morohashi, K., O'Connor, D., Grotewold,

E., and Hake, S. (2012). Unraveling the KNOTTED1 regulatory network in maize meristems. Genes Dev 26, 1685-1690.

Breton, G., Kay, S.A., and Pruneda-Paz, J.L. (2016). Identification of Arabidopsis

Transcriptional Regulators by Yeast One-Hybrid Screens Using a Transcription Factor

ORFeome. Methods in molecular biology (Clifton, NJ) 1398, 107-118.

Brohammer, A.B., Kono, T.J.Y., Springer, N.M., McGaugh, S.E., and Hirsch, C.N.

(2018). The limited role of differential fractionation in genome content variation and function in maize (Zea mays L.) inbred lines. Plant Journal 93, 131-141.

Burdo, B., Gray, J., Goetting-Minesky, M.P., Wittler, B., Hunt, M., Li, T., Velliquette,

D., Thomas, J., Gentzel, I., dos Santos Brito, M., et al. (2014). The Maize TFome-- development of a transcription factor open reading frame collection for functional genomics. Plant J 80, 356-366.

Chai, C.L., Wang, Y.Q., Joshi, T., Valliyodan, B., Prince, S., Michel, L., Xu, D., and

Nguyen, H.T. (2015). Soybean transcription factor ORFeome associated with drought resistance: a valuable resource to accelerate research on abiotic stress resistance. Bmc

Genomics 16. 53

Chang, Y.M., Liu, W.Y., Shih, A.C.C., Shen, M.N., Lu, C.H., Lu, M.Y.J., Yang, H.W.,

Wang, T.Y., Chen, S.C.C., Chen, S.M., et al. (2012). Characterizing Regulatory and

Functional Differentiation between Maize Mesophyll and Bundle Sheath Cells by

Transcriptomic Analysis. Plant Physiol 160, 165-177.

Charfeddine, M., Bouaziz, D., Charfeddine, S., Hammami, A., Ellouz, O.N., and Bouzid,

R.G. (2015). Overexpression of dehydration-responsive element-binding 1 protein

(DREB1) in transgenic Solanum tuberosum enhances tolerance to biotic stress. Plant

Biotechnology Reports 9, 79-88.

Chen, L.H., Han, J.P., Deng, X.M., Tan, S.L., Li, L.L., Li, L., Zhou, J.F., Peng, H., Yang,

G.X., He, G.Y., et al. (2016). Expansion and stress responses of AP2/EREBP superfamily in Brachypodium Distachyon. Sci Rep 6.

Chettoor, A., Givan, S., Cole, R., Coker, C., Unger-Wallace, E., Vejlupkova, Z.,

Vollbrecht, E., Fowler, J., and Evans, M. (2014). Discovery of novel transcripts and gametophytic functions via RNA-seq analysis of maize gametophytic transcriptomes.

Genome Biol 15.

Chuck, G., Muszynski, M., Kellogg, E., Hake, S., and Schmidt, R.J. (2002). The control of spikelet meristem identity by the branched silkless1 gene in maize. Science 298, 1238-

1241.

Davidson, R.M., Hansey, C.N., Gowda, M., Childs, K.L., Lin, H.N., Vaillancourt, B.,

Sekhon, R.S., de Leon, N., Kaeppler, S.M., Jiang, N., et al. (2011). Utility of RNA

Sequencing for Analysis of Maize Reproductive Transcriptomes. Plant Genome 4, 191-

203.

54

Forestan, C., Farinati, S., Rouster, J., Lassagne, H., Lauria, M., Dal Ferro, N., and

Varotto, S. (2018). Control of Maize Vegetative and Reproductive Development,

Fertility, and rRNAs Silencing by HISTONE DEACETYLASE 108. Genetics 208, 1443-

1466.

Gray, J., Burdo, B., Goetting-Minesky, M.P., Wittler, B., Hunt, M., Li, T., Velliquette,

D., Thomas, J., Agarwal, T., Key, K., et al. (2015). Protocol for the Generation of a

Transcription Factor Open Reading Frame Collection (TFome). Bio-protocol 5.

Huang, H.H., Xie, S.D., Xiao, Q.L., Wei, B., Zheng, L.J., Wang, Y.B., Cao, Y., Zhang,

X.G., Long, T.D., Li, Y.P., et al. (2016). Sucrose and ABA regulate starch biosynthesis in maize through a novel transcription factor, ZmEREB156. Sci Rep 6.

Jamann, T.M., Sood, S., Wisser, R.J., and Holland, J.B. (2017). High-Throughput

Resequencing of Maize Landraces at Genomic Regions Associated with Flowering Time.

Plos One 12.

Jiang, F.K., Guo, M., Yang, F., Duncan, K., Jackson, D., Rafalski, A., Wang, S.C., and

Li, B.L. (2012). Mutations in an AP2 Transcription Factor-Like Gene Affect Internode

Length and Leaf Shape in Maize. Plos One 7.

Jiao, Y.P., Peluso, P., Shi, J.H., Liang, T., Stitzer, M.C., Wang, B., Campbell, M.S.,

Stein, J.C., Wei, X.H., Chin, C.S., et al. (2017). Improved maize reference genome with single-molecule technologies. Nature 546, 524-+.

Johnston, R., Wang, M.H., Sun, Q., Sylvester, A.W., Hake, S., and Scanlon, M.J. (2014).

Transcriptomic Analyses Indicate That Maize Ligule Development Recapitulates Gene

Expression Patterns That Occur during Lateral Organ Initiation. Plant Cell 26, 4718-

4732. 55

Jonczyk, M., Sobkowiak, A., Trzcinska-Danielewicz, J., Skoneczny, M., Solecka, D.,

Fronk, J., and Sowinski, P. (2017). Global analysis of gene expression in maize leaves treated with low temperature. II. Combined effect of severe cold (8 A degrees C) and circadian rhythm. Plant MolBiol 95, 279-302.

Kagaya, Y., Ohmiya, K., and Hattori, T. (1999). RAV1, a novel DNA-binding protein, binds to bipartite recognition sequence through two distinct DNA-binding domains uniquely found in higher plants. Nucleic Acids Res 27, 470-478.

Kakumanu, A., Ambavaram, M.M.R., Klumas, C., Krishnan, A., Batlang, U., Myers, E.,

Grene, R., and Pereira, A. (2012). Effects of Drought on Gene Expression in Maize

Reproductive and Leaf Meristem Tissue Revealed by RNA-Seq. Plant Physiol 160, 846-

867.

Kizis, D., and Pages, M. (2002). Maize DRE-binding proteins DBF1 and DBF2 are involved in rab17 regulation through the drought-responsive element in an ABA- dependent pathway. Plant Journal 30, 679-689.

Klucher, K.M., Chow, H., Reiser, L., and Fischer, R.L. (1996). The AINTEGUMENTA gene of arabidopsis required for ovule and female gametophyte development is related to the floral homeotic gene APETALA2. Plant Cell 8, 137-153.

Lambert, S.A., Jolma, A., Campitelli, L.F., Das, P.K., Yin, Y.M., Albu, M., Chen, X.T.,

Taipale, J., Hughes, T.R., and Weirauch, M.T. (2018). The Human Transcription Factors.

Cell 172, 650-665.

Lamesch, P., Milstein, S., Hao, T., Rosenberg, J., Li, N., Sequerra, R., Bosak, S.,

Doucette-Stamm, L., Vandenhaute, J., Hill, D.E., et al. (2004). C-elegans ORFeome

56

version 3.1: Increasing the coverage of 2064 ORFeome resources with improved gene predictions. Genome Res 14, 2064-2069.

Li, L., Briskine, R., Schaefer, R., Schnable, P.S., Myers, C.L., Flagel, L.E., Springer,

N.M., and Muehlbauer, G.J. (2016). Co-expression network analysis of duplicate genes in maize (Zea mays L.) reveals no subgenome bias. BMC Genomics 17, 875.

Li, P.H., Ponnala, L., Gandotra, N., Wang, L., Si, Y.Q., Tausta, S.L., Kebrom, T.H.,

Provart, N., Patel, R., Myers, C.R., et al. (2010). The developmental dynamics of the maize leaf transcriptome. Nature Genet 42, 1060-U1051.

Liu, D., Hunt, M., and Tsai, I.J. (2018). Inferring synteny between genome assemblies: a systematic evaluation. BMC Bioinformatics 19, 26.

Liu, S., Wang, X., Wang, H., Xin, H., Yang, X., Yan, J., Li, J., Tran, L.S., Shinozaki, K.,

Yamaguchi-Shinozaki, K., et al. (2013). Genome-wide analysis of ZmDREB genes and their association with natural variation in drought tolerance at seedling stage of Zea mays

L. PLoS Genet 9, e1003790.

Matias-Hernandez, L., Aguilar-Jaramillo, A.E., Marin-Gonzalez, E., Suarez-Lopez, P., and Pelaz, S. (2014). RAV genes: regulation of floral induction and beyond. Annals of

Botany (London) 114, 1459-1470.

Matsuyama, A., Arai, R., Yashiroda, Y., Shirai, A., Kamata, A., Sekido, S., Kobayashi,

Y., Hashimoto, A., Hamamoto, M., Hiraoka, Y., et al. (2006). ORFeome cloning and global analysis of protein localization in the fission yeast Schizosaccharomyces pombe

(vol 24, pg 841, 2006). Nat Biotechnol 24, 1033-1033.

Navarro, J.A.R., Willcox, M., Burgueno, J., Romay, C., Swarts, K., Trachsel, S.,

Preciado, E., Terron, A., Delgado, H.V., Vidal, V., et al. (2017). A study of allelic 57

diversity underlying flowering-time adaptation in maize landraces. Nature Genet 49, 476-

480.

Nishiyama, M.Y., Ferreira, S.S., Tang, P.Z., Becker, S., Portner-Taliana, A., and Souza,

G.M. (2014). Full-Length Enriched cDNA Libraries and ORFeome Analysis of

Sugarcane Hybrid and Ancestor Genotypes. Plos One 9.

Perdomo-Sabogal, A., Kanton, S., Walter, M.B.C., and Nowick, K. (2014). The role of gene regulatory factors in the evolutionary history of humans. Curr Opin Genet Dev 29,

60-67.

Perduns, R., Horst-Niessen, I., and Peterhansel, C. (2015). Photosynthetic Genes and

Genes Associated with the C4 Trait in Maize Are Characterized by a Unique Class of

Highly Regulated Histone Acetylation Peaks on Upstream Promoters. Plant Physiology

(Rockville) 168, 1378.

Pruneda-Paz, J.L., Breton, G., Nagel, D.H., Kang, S.E., Bonaldi, K., Doherty, C.J.,

Ravelo, S., Galli, M., Ecker, J.R., and Kay, S.A. (2014). A Genome-Scale Resource for the Functional Characterization of Arabidopsis Transcription Factors. Cell Reports 8,

621-631.

Qin, F., Li, J., Zhang, G.-Y., Zhao, J., Chen, S.-Y., and Liu, Q. (2003). Isolation and structural analysis of DRE-binding transcription factor from maize (Zea mays L.). Acta

Botanica Sinica 45, 331-339.

Riechmann, J.L., Heard, J., Martin, G., Reuber, L., Jiang, C.Z., Keddie, J., Adam, L.,

Pineda, O., Ratcliffe, O.J., Samaha, R.R., et al. (2000). Arabidopsis transcription factors:

Genome-wide comparative analysis among eukaryotes. Science (Washington D C) 290,

2105-2110. 58

Rihan, H.Z., Al-Issawi, M., and Fuller, M.P. (2017). Upregulation of CBF/DREB1 and cold tolerance in artificial seeds of cauliflower ( Brassica oleracea var. botrytis ). Scientia

Horticulturae 225, 299-309.

Schnable, J.C., and Freeling, M. (2011). Genes identified by visible mutant phenotypes show increased bias toward one of two subgenomes of maize. PLoS One 6, e17855.

Seaver, S.M., Bradbury, L.M., Frelin, O., Zarecki, R., Ruppin, E., Hanson, A.D., and

Henry, C.S. (2015). Improved evidence-based genome-scale metabolic models for maize leaf, embryo, and endosperm. Front Plant Sci 6, 142.

Shen, B., Allen, W.B., Zheng, P.Z., Li, C.J., Glassman, K., Ranch, J., Nubel, D., and

Tarczynski, M.C. (2010). Expression of ZmLEC1 and ZmWRI1 Increases Seed Oil

Production in Maize. Plant Physiol 153, 980-987.

Shepard, K.A., and Purugganan, M.D. (2002). The genetics of plant morphological evolution. Current Opinion in Plant Biology 5, 49-55.

Tello-Ruiz, M.K., Naithani, S., Stein, J.C., Gupta, P., Campbell, M., Olson, A., Wei,

S.R., Preece, J., Geniza, M.J., Jiao, Y.P., et al. (2018). Gramene 2018: unifying comparative genomics and pathway resources for plant research. Nucleic Acids Res 46,

D1181-D1189.

Wang, X.F., Elling, A.A., Li, X.Y., Li, N., Peng, Z.Y., He, G.M., Sun, H., Qi, Y.J., Liu,

X.S., and Deng, X.W. (2009). Genome-Wide and Organ-Specific Landscapes of

Epigenetic Modifications and Their Relationships to mRNA and Small RNA

Transcriptomes in Maize. Plant Cell 21, 1053-1069.

59

Wang, Z., Libault, M., Joshi, T., Valliyodan, B., Nguyen, H.T., Xu, D., Stacey, G., and

Cheng, J.L. (2010). SoyDB: a knowledge database of soybean transcription factors. BMC

Plant Biol 10.

Waters, A.J., Makarevitch, I., Eichten, S.R., Swanson-Wagner, R.A., Yeh, C.T., Xu,

W.N., Schnable, P.S., Vaughn, M.W., Gehring, M., and Springer, N.M. (2011). Parent- of-Origin Effects on Gene Expression and DNA Methylation in the Maize Endosperm.

Plant Cell 23, 4221-4233.

Wei, C.C., Lamesch, P., Arumugam, M., Rosenberg, J., Hu, P., Vidal, M., and Brent,

M.R. (2005). Closing in on the C elegans ORFeome by cloning TWINSCAN predictions.

Genome Res 15, 577-582.

Weston, L., Greenwood, J., and Nurse, P. (2017). Genome-wide screen for cell growth regulators in fission yeast. J Cell Sci 130, 2049-2055.

Wiemann, S., Pennacchio, C., Hu, Y.H., Hunter, P., Harbers, M., Amiet, A., Bethel, G.,

Busse, M., Carninci, P., Diekhans, M., et al. (2016). The ORFeome Collaboration: a genome-scale human ORF-clone resource. Nature Methods 13, 191-192.

Xu, D.Y., Wang, X.F., Huang, C., Xu, G.H., Liang, Y.M., Chen, Q.Y., Wang, C.L., Li,

D., Tian, J.G., Wu, L.S., et al. (2017a). Glossy15 Plays an Important Role in the

Divergence of the Vegetative Transition between Maize and Its Progenitor, Teosinte. Mol

Plant 10, 1579-1583.

Xu, G.H., Wang, X.F., Huang, C., Xu, D.Y., Li, D., Tian, J.G., Chen, Q.Y., Wang, C.L.,

Liang, Y.M., Wu, Y.Y., et al. (2017b). Complex genetic architecture underlies maize tassel domestication. New Phytol 214, 852-864.

60

Yamasaki, K., Kigawa, T., Inoue, M., Tateno, M., Yamasaki, T., Yabuki, T., Aoki, M.,

Seki, E., Matsuda, T., Tomo, Y., et al. (2004). Solution structure of the B3 DNA binding domain of the Arabidopsis cold-responsive transcription factor RAV1. Plant Cell 16,

3448-3459.

Yang, C., Lu, X., Ma, B., Chen, S.Y., and Zhang, J.S. (2015). Ethylene Signaling in Rice and Arabidopsis: Conserved and Diverged Aspects. Mol Plant 8, 495-505.

Yang, F., Li, W., Jiang, N., Yu, H.D., Morohashi, K., Ouma, W.Z., Morales-Mantilla,

D.E., Gomez-Cano, F.A., Mukundi, E., Prada-Salcedo, L.D., et al. (2017). A Maize Gene

Regulatory Network for Phenolic Metabolism. Mol Plant 10, 498-515.

Yilmaz, A., Nishiyama, M.Y., Jr., Fuentes, B.G., Souza, G.M., Janies, D., Gray, J., and

Grotewold, E. (2009). GRASSIUS: a platform for comparative regulatory genomics across the grasses. Plant Physiol 149, 171-180.

Zhang, W.Q., Li, Z., Fang, H., Zhang, M.C., and Duan, L.S. (2018). Analysis of the genetic basis of plant height-related traits in response to ethylene by QTL mapping in maize (Zea mays L.). Plos One 13.

Zhuang, J., Deng, D.X., Yao, Q.H., Zhang, J.A., Xiong, F., Chen, J.M., and Xiong, A.S.

(2010). Discovery, phylogeny and expression patterns of AP2-like genes in maize. Plant

Growth Regul 62, 51-58.

61

Appendix A

A1: Table A1: List of Uncloned TF Genes and Gene Model Rankings

Table A1: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmHB17 Zm00001d000247 5 #N/A #N/A ZmMYBR61 Zm00001d000110 5 #N/A #N/A ZmOrphan126 Zm00001d000387 5 #N/A #N/A ZmOrphan25 Zm00001d015279 5 1843.41 Pollen1 1 ZmHB68 Zm00001d028129 5 331.853 Ovaries Drought stressed ZmOrphan69 Zm00001d044525 5 326.58 Mesophyll cells 400 1 ZmHB13 Zm00001d021934 5 306.899 Ovaries Drought stressed ZmMYBR79 Zm00001d049543 5 255.402 20days old leaves seedling field ZmEREB102 Zm00001d020595 5 172.796 Ovaries Drought stressed ZmOrphan237 Zm00001d023314 5 168.779 Leaves 20 days old seedling gc ZmWRKY7 Zm00001d044315 5 161.77 Endosperm 14DAP ZmARF10 Zm00001d042267 5 159.23 pre emergence cob ZmMADS23 Zm00001d023955 5 130.573 Ovule B73 ZmCPP10 Zm00001d041445 5 125.641 post emergence cob ZmSIG8 Zm00001d039194 5 123.203 Mature leaf ZmMADS60 Zm00001d044850 5 122.559 Whole anthers ZmbHLH133 Zm00001d025727 5 120.861 B73 pollen ZmOrphan341 Zm00001d015504 5 119.594 Leaf Meristems Drought stressed ZmOrphan104 Zm00001d012005 5 98.8813 Ovaries Drought stressed ZmOrphan170 Zm00001d025323 5 93.7656 Mesophyll cells 400 1 ZmOrphan323 Zm00001d040123 5 89.4336 Ovaries Drought stressed ZmHB72 Zm00001d046568 5 85.2166 Leaf Meristems ZmSBP16 Zm00001d031487 5 83.1252 Ovaries Drought stressed ZmOrphan201 Zm00001d008429 5 82.5778 Ovaries Drought stressed ZmTCP33 Zm00001d039371 5 82.3295 Embryo 14DAP ZmbHLH54 Zm00001d011847 5 81.9918 20days old leaves seedling field 62

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmHB33 Zm00001d033378 5 81.0632 mature silk ZmOrphan33 Zm00001d038838 5 81.0361 5 DAP Seed ZmMADS61 Zm00001d045231 5 79.4077 Tassel primordia 1 ZmOrphan7 Zm00001d011197 5 77.9191 mature silk ZmCPP5 Zm00001d031009 5 75.6341 25 DAP embryo ZmNAC58 Zm00001d049540 5 74.4019 Seedling shoots ZmbHLH47 Zm00001d034298 5 73.4898 20days old leaves seedling field ZmOrphan277 Zm00001d028075 5 73.2212 Shoot apex 1 ZmTHX10 Zm00001d021389 5 72.423 Seedling shoots ZmGRAS42 Zm00001d045507 5 71.7808 Embryo 14DAP ZmOrphan68 Zm00001d011138 5 70.7289 post emergence cob ZmSIG1 Zm00001d028585 5 65.4483 Mature leaf ZmSIG2 Zm00001d028585 5 65.4483 Mature leaf ZmSIG3 Zm00001d028585 5 65.4483 Mature leaf ZmGRAS8 Zm00001d033680 5 65.1018 pre emergence cob ZmHB6 Zm00001d053151 5 63.0999 Leaf Meristems ZmOFP22 Zm00001d052606 5 60.848 Developing Leaf ZmWRKY79 Zm00001d020137 5 59.8426 Leaf Meristems Drought stressed ZmHB91 Zm00001d026351 5 59.2676 Embryo 14DAP ZmOrphan331 Zm00001d007904 5 56.8074 mature silk ZmOrphan197 Zm00001d044819 5 52.683 Whole anthers ZmSIG4 Zm00001d049160 5 52.4864 Mesophyll cells 400 1 ZmGRAS41 Zm00001d048199 5 52.3829 Seedling shoots ZmGRAS45 Zm00001d048199 5 52.3829 Seedling shoots ZmMADS77 Zm00001d031625 5 51.6659 Leaves 20 days old seedling gc ZmOrphan115 Zm00001d032637 5 50.9649 Ovaries 1 DAP ZmYAB10 Zm00001d032502 5 50.557 Embryo 14DAP ZmbZIP33 Zm00001d012553 5 50.2895 25 DAP endosperm ZmOrphan29 Zm00001d046828 5 49.3647 P6 Wt3 1 ZmHB58 Zm00001d027317 5 48.8861 pre emergence cob ZmC3H13 Zm00001d042363 5 48.6301 Bundle sheath cells ZmOrphan131 Zm00001d021291 5 47.1256 Leaf Meristems ZmHB51 Zm00001d027431 5 46.6263 Seedling shoots ZmSIG5 Zm00001d049494 5 46.3607 Mature leaf ZmOrphan182 Zm00001d053594 5 45.4714 Embryo 14DAP ZmOrphan54 Zm00001d026084 5 45.4621 Embryo 14DAP ZmOrphan211 Zm00001d053259 5 44.5874 Ovaries Drought stressed ZmHB75 Zm00001d002234 5 44.1459 Embryo 14DAP

63

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmZIM35 Zm00001d050365 5 42.0543 Leaf Meristems Drought stressed ZmOrphan82 Zm00001d044303 5 41.2418 Seedling shoots ZmHB28 Zm00001d015265 5 40.8313 Ovaries Drought stressed ZmCA2P4 Zm00001d006835 5 40.7614 25 DAP embryo ZmOrphan59 Zm00001d042312 5 40.7091 Ovaries Drought stressed ZmOrphan344 Zm00001d013412 5 39.9827 post pollination ZmMYB135 Zm00001d042287 5 39.6679 Shoot field1 1 ZmOrphan354 Zm00001d041035 5 39.4064 DAP 10 ZmOrphan213 Zm00001d032381 5 39.1249 Endosperm 14DAP ZmOrphan110 Zm00001d014286 5 38.9137 Ovaries Drought stressed ZmHB29 Zm00001d009562 5 38.8323 Ovaries Drought stressed ZmPHD17 Zm00001d010974 5 37.9623 Ovaries Drought stressed ZmHB31 Zm00001d040090 5 37.8824 Embryo 14DAP ZmOrphan283 Zm00001d028056 5 37.5299 Leaves 20 days old seedling gc ZmOrphan227 Zm00001d053313 5 37.1192 20 days old leaves seedling field ZmbZIP71 Zm00001d053967 5 37.1165 Ovaries Drought stressed ZmARF11 Zm00001d043431 5 36.6836 pre emergence cob ZmOrphan20 Zm00001d018481 5 36.588 DAP 10 ZmHB36 Zm00001d033898 5 36.4877 Leaf Meristems ZmOrphan66 Zm00001d046778 5 34.8577 mature silk ZmOrphan206 Zm00001d006580 5 34.3986 Embryo 14DAP ZmMTERF12 Zm00001d049743 5 34.3727 Developing Leaf ZmOrphan172 Zm00001d014074 5 33.9944 Whole anthers ZmHB37 Zm00001d013547 5 33.4124 Ovaries Drought stressed ZmOrphan62 Zm00001d046831 5 33.2874 Seedling shoots ZmALF3 Zm00001d022110 5 33.0685 Embryo 14DAP ZmNAC81 Zm00001d047554 5 32.9315 Ovaries Drought stressed ZmFARL3 Zm00001d034813 5 32.7788 Maize EmbryoSac1 1

ZmNAC89 Zm00001d031898 5 32.621 Seedling shoots ZmOrphan260 Zm00001d023640 5 32.5519 pre emergence cob ZmOrphan351 Zm00001d047209 5 32.0747 Seedling shoots ZmNLP9 Zm00001d025757 5 31.511 Ovaries Drought stressed ZmTUB5 Zm00001d038691 5 30.4876 Leaf Meristems Drought stressed ZmJMJ17 Zm00001d029392 5 29.756 Bundle sheath cell 400 1 ZmMYB125 Zm00001d032206 5 29.6261 Developing Leaf ZmHSF9 Zm00001d048041 5 29.6094 Mesophyll cells ZmOrphan200 Zm00001d052943 5 29.5082 Endosperm 14DAP ZmPHD23 Zm00001d035933 5 29.2552 25 DAP endosperm

64

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmCAMTA2 Zm00001d021516 5 28.938 pre emergence cob ZmOrphan190 Zm00001d035208 5 28.1412 Mesophyll cells ZmC3H20 Zm00001d022427 5 27.9201 Embryo 14DAP ZmSBP17 Zm00001d034730 5 27.8256 pre emergence cob ZmOrphan114 Zm00001d030518 5 27.7554 Endosperm 14DAP ZmHB87 Zm00001d046405 5 27.6788 Seedling Roots ZmbZIP49 Zm00001d031790 5 27.4337 Embryo 14DAP ZmGRAS44 Zm00001d013465 5 27.3951 DAP 10 ZmHB30 Zm00001d027435 5 27.2958 Leaves 20 days old seedling gc ZmGLK27 Zm00001d007962 5 27.1512 Embryo 14DAP ZmOrphan225 Zm00001d013276 5 26.8963 mature silk ZmMYB157 Zm00001d016000 5 26.8438 pre emergence cob ZmMYBR74 Zm00001d016000 5 26.8438 pre emergence cob ZmOrphan279 Zm00001d052110 5 26.6152 25 DAP embryo ZmMYBR69 Zm00001d045581 5 26.5193 25 DAP endosperm ZmPLATZ6 Zm00001d051376 5 26.1049 post pollination ZmFARL1 Zm00001d028472 5 26.0696 21 days old leaves seedling field ZmOrphan234 Zm00001d016898 5 25.7581 mature silk ZmOrphan264 Zm00001d036710 5 25.2941 Mesophyll cells 400 1 ZmNAC99 Zm00001d048059 5 25.1874 Ovaries Drought stressed ZmHB119 Zm00001d031061 5 25.1789 pre emergence cob ZmMTERF23 Zm00001d018756 5 24.5325 Mesophyll cells 400 1 ZmMTERF24 Zm00001d018756 5 24.5325 Mesophyll cells 400 1 ZmHB15 Zm00001d030069 5 24.2215 pre emergence cob ZmOrphan302 Zm00001d031098 5 24.1813 Ovaries Drought stressed ZmMYBR85 Zm00001d010967 5 24.1204 DAP 10 ZmHSF18 Zm00001d016255 5 23.9785 Ovaries Drought stressed ZmARF22 Zm00001d036593 5 23.9258 pre emergence cob ZmOrphan135 Zm00001d006212 5 23.9218 Leaf Meristems ZmMADS44 Zm00001d015381 5 23.9088 DAP 10 ZmPHD25 Zm00001d013278 5 23.3712 post emergence cob ZmOrphan274 Zm00001d030864 5 23.3672 mature silk ZmbZIP125 Zm00001d037170 5 23.1713 Embryo 14DAP ZmGLK43 Zm00001d038527 5 23.0267 Leaf Meristems ZmEREB41 Zm00001d007840 5 22.9572 Shoot apex 1 ZmPHD7 Zm00001d015064 5 22.8202 Embryo 14DAP ZmMADS52 Zm00001d052648 5 22.6017 Endosperm 14DAP ZmCA2P3 Zm00001d029489 5 22.3871 mature silk

65

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmPHD9 Zm00001d026270 5 22.0473 Seedling shoots ZmMYBR62 Zm00001d037836 5 21.5681 post emergence cob ZmPHD12 Zm00001d035487 5 21.0888 Ovaries Drought stressed ZmOrphan210 Zm00001d016332 5 20.9543 Bundle sheath cells ZmHB113 Zm00001d011560 5 20.8473 Ovule B73 ZmMYB117 Zm00001d039306 5 20.6179 pre emergence cob ZmTUB10 Zm00001d038310 5 20.5059 post emergence cob ZmbHLH130 Zm00001d049294 5 20.4569 mature silk ZmMYB126 Zm00001d016952 5 19.9804 Ovaries 1 DAP ZmOrphan316 Zm00001d002046 5 19.3331 DAP 10 ZmHB49 Zm00001d033005 5 19.3212 Developing Leaf ZmMYBR82 Zm00001d020947 5 19.1918 Embryo 14DAP ZmbHLH152 Zm00001d016873 5 19.1032 Endosperm 14DAP ZmPHD37 Zm00001d045080 5 18.9293 pre emergence cob ZmOrphan306 Zm00001d021785 5 18.8472 Seedling shoots ZmHB80 Zm00001d053749 5 18.4645 Ovaries Drought stressed ZmOrphan4 Zm00001d052781 5 18.4505 Ovaries Drought stressed ZmSBP9 Zm00001d008176 5 18.023 Seedling shoots ZmSBP11 Zm00001d010309 5 17.7472 Ovaries Drought stressed ZmPHD8 Zm00001d026698 5 17.5037 post emergence cob ZmLFY1 Zm00001d002449 5 17.3533 Tassel primordia 1 ZmOrphan238 Zm00001d017977 5 17.3206 25 DAP embryo ZmOrphan330 Zm00001d019495 5 17.0484 DAP 10 ZmOrphan257 Zm00001d017241 5 16.9892 Leaf primordia ZmARR8 Zm00001d018380 5 16.8622 Ovaries Drought stressed ZmGLK25 Zm00001d010634 5 16.8507 Leaf Meristems ZmOrphan165 Zm00001d035776 5 16.7387 Endosperm 14DAP ZmOrphan137 Zm00001d021714 5 16.7167 Embryo 14DAP ZmPHD26 Zm00001d021541 5 16.703 post emergence cob ZmSBP1 Zm00001d049822 5 16.3328 pre emergence cob ZmbHLH12 Zm00001d052038 5 16.2797 Bundle sheath cell 400 1 ZmHB78 Zm00001d029934 5 16.2083 Ovaries Drought stressed ZmSRS3 Zm00001d053208 5 16.0621 Embryo 14DAP ZmNAC11 Zm00001d028995 5 16.0544 Ovaries Drought stressed ZmGRAS3 Zm00001d030744 5 16.0462 Embryo 14DAP ZmTCP21 Zm00001d031725 5 15.9719 Leaf Meristems Drought stressed ZmbZIP66 Zm00001d014710 5 15.942 Seedling Roots ZmPHD32 Zm00001d035862 5 15.9024 Seedling shoots

66

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmJMJ21 Zm00001d051849 5 15.8627 pre emergence cob ZmPHD41 Zm00001d019907 5 15.6222 pre emergence cob ZmOrphan24 Zm00001d027598 5 15.5735 post pollination ZmNAC36 Zm00001d003052 5 15.2783 Ovaries Drought stressed ZmSBP32 Zm00001d049824 5 14.9693 post emergence cob ZmHB92 Zm00001d038812 5 14.9341 P7 blade L12 1 ZmOrphan10 Zm00001d013967 5 14.7299 DAP 10 ZmMADS19 Zm00001d016957 5 14.6641 Whole anthers ZmPHD5 Zm00001d053989 5 14.6204 Endosperm 14DAP ZmMYB78 Zm00001d032194 5 14.5901 Shoot apex 1 ZmC3H23 Zm00001d008323 5 14.1583 Ovaries Drought stressed ZmOrphan155 Zm00001d041395 5 14.142 Ovaries Drought stressed ZmPHD39 Zm00001d053697 5 14.0749 Endosperm 14DAP ZmHB69 Zm00001d032681 5 14.0463 pre emergence cob ZmFARL5 Zm00001d041923 5 13.9045 Shoot apex 1 ZmMYB32 Zm00001d042665 5 13.8159 Ovaries Drought stressed ZmEREB110 Zm00001d010987 5 13.71 Embryo 14DAP ZmLFY2 Zm00001d026231 5 13.655 Tassel primordia 1 ZmEREB187 Zm00001d018050 5 13.6516 Ovaries Drought stressed ZmNAC122 Zm00001d022424 5 13.4606 Seedling shoots ZmOrphan173 Zm00001d009364 5 13.3532 Mesophyll cells 400 1 ZmOrphan91 Zm00001d034655 5 13.3468 Maize EmbryoSac1 1 ZmJMJ9 Zm00001d051961 5 13.3238 P7 sheath L12 1 ZmARF29 Zm00001d026540 5 13.2678 Shoot apex 1 ZmPHD31 Zm00001d006428 5 13.2261 DAP 10 ZmE2F13 Zm00001d052288 5 13.171 Ovaries Drought stressed ZmMADS54 Zm00001d039913 5 13.152 25 DAP endosperm ZmHB103 Zm00001d010758 5 13.016 Endosperm 14DAP ZmbHLH96 Zm00001d007382 5 12.8935 Shoot field1 1 ZmOrphan305 Zm00001d032692 5 12.8834 mature silk ZmE2F9 Zm00001d011597 5 12.8517 Shoot apex 1 ZmGLK26 Zm00001d028984 5 12.6793 Seedling Roots ZmOrphan72 Zm00001d038655 5 12.668 Ovaries Drought stressed ZmOrphan295 Zm00001d021636 5 12.6075 pre emergence cob ZmWRKY100 Zm00001d038761 5 12.5039 Shoot field1 1 ZmEIL3 Zm00001d031445 5 12.4267 pre pollination tassel ZmOrphan347 Zm00001d015268 5 12.3782 Developing Leaf ZmHB3 Zm00001d039002 5 12.2302 Endosperm 14DAP

67

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmGBP9 Zm00001d047421 5 12.1987 Embryo 14DAP ZmOrphan248 Zm00001d014297 5 12.1911 mature silk ZmFARL2 Zm00001d034282 5 12.1875 pre emergence cob ZmOrphan193 Zm00001d047732 5 12.1809 Leaf Meristems Drought stressed ZmMYB162 Zm00001d020457 5 11.8912 mature silk ZmABI25 Zm00001d033324 5 11.8133 Shoot apex 1 ZmC3H25 Zm00001d016836 5 11.7404 pre emergence cob ZmHB99 Zm00001d043915 5 11.7303 pre emergence cob ZmTHX44 Zm00001d015605 5 11.6867 Seedling Roots ZmARID7 Zm00001d014900 5 11.5749 pre emergence cob ZmPHD29 Zm00001d014900 5 11.5749 pre emergence cob ZmMYB15 Zm00001d042910 5 11.4855 Ovaries Drought stressed ZmPHD4 Zm00001d023914 5 11.4321 Ovaries Drought stressed ZmSBP5 Zm00001d006091 5 11.3673 pre emergence cob ZmFARL14 Zm00001d046441 5 11.3593 Ovaries Drought stressed ZmMYB7 Zm00001d006922 5 11.2215 DAP 10 ZmOrphan348 Zm00001d036214 5 11.1485 mature silk ZmMYBR28 Zm00001d016784 5 11.1447 pre emergence cob ZmMYBR38 Zm00001d016784 5 11.1447 pre emergence cob ZmMYBR99 Zm00001d013827 5 11.1025 pre emergence cob ZmOrphan156 Zm00001d033968 5 11.0943 Ovaries Drought stressed ZmMYB58 Zm00001d033075 5 11.0604 Endosperm 14DAP ZmNLP15 Zm00001d015201 5 10.9569 Ovaries Drought stressed ZmNLP7 Zm00001d027510 5 10.9483 Leaves 20 days old seedling gc ZmbHLH21 Zm00001d048229 5 10.7184 Tassel primordia 1 ZmOrphan60 Zm00001d051075 5 10.6473 pre pollination tassel ZmWRKY61 Zm00001d047309 5 10.5749 Seedling shoots ZmNAC113 Zm00001d014405 5 10.5239 post pollination ZmOrphan32 Zm00001d047412 5 10.4472 Ear primordia 1 ZmOrphan352 Zm00001d007240 5 10.3793 Bundle sheath cell 400 1 ZmNAC108 Zm00001d041472 5 10.2044 Leaf Meristems Drought stressed ZmCPP3 Zm00001d010713 5 10.0868 5 DAP Seed ZmOrphan199 Zm00001d037688 5 9.9961 P7 sheath L13 1 ZmC3H17 Zm00001d033610 5 9.46078 post emergence cob ZmbZIP11 Zm00001d042777 5 9.38895 Seedling shoots ZmbZIP114 Zm00001d042777 5 9.38895 Seedling shoots ZmFARL9 Zm00001d017165 5 9.34658 Ovaries Drought stressed ZmFARL4 Zm00001d039612 5 9.32145 Bundle sheath cell 400 1

68

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmOrphan15 Zm00001d043691 5 9.31484 pre emergence cob ZmCPP4 Zm00001d038660 5 9.29148 Embryo 14DAP ZmGLK29 Zm00001d051527 5 9.2745 Embryo 14DAP ZmDBP4 Zm00001d046450 5 9.22921 Leaf primordia ZmHSF19 Zm00001d029270 5 9.19631 Shoot apex 1 ZmOrphan73 Zm00001d041983 5 9.10821 DAP 10 ZmABI20 Zm00001d011745 5 9.07804 Ovaries Drought stressed ZmWRKY85 Zm00001d018656 5 9.06064 5 DAP Seed ZmNAC123 Zm00001d035084 5 8.82491 5 DAP Seed ZmOrphan125 Zm00001d020129 5 8.75057 pre emergence cob ZmJMJ5 Zm00001d020367 5 8.67381 Endosperm 14DAP ZmCAMTA5 Zm00001d025235 5 8.6691 DAP 10 ZmJMJ19 Zm00001d018647 5 8.66462 Whole anthers ZmGBP1 Zm00001d035224 5 8.46822 Embryo 14DAP ZmNLP13 Zm00001d021442 5 8.44773 Leaf Meristems ZmMYB35 Zm00001d005760 5 8.27948 mature silk ZmOrphan229 Zm00001d007530 5 8.17074 Bundle sheath cell 400 1 ZmCA2P1 Zm00001d013501 5 8.1343 Leaf Meristems Drought stressed ZmPHD10 Zm00001d011158 5 8.05921 pre emergence cob

ZmGLK28 Zm00001d018698 5 7.98002 pre pollination tassel ZmMYB48 Zm00001d018097 5 7.9768 Seedling shoots ZmOrphan281 Zm00001d051689 5 7.93274 Embryo 14DAP ZmOrphan342 Zm00001d029297 5 7.59933 Shoot apex 1 ZmFARL7 Zm00001d016772 5 7.57886 Ovaries Drought stressed ZmHB71 Zm00001d018225 5 7.56383 Embryo 14DAP ZmPHD40 Zm00001d041708 5 7.54788 pre emergence cob ZmNAC98 Zm00001d046246 5 7.53774 Maize EmbryoSac1 1 ZmHB27 Zm00001d015671 5 7.50522 Seedling Roots ZmFARL11 Zm00001d021617 5 7.3837 Ovaries Drought stressed ZmJMJ8 Zm00001d023788 5 7.24182 pre emergence cob ZmPHD19 Zm00001d018912 5 7.17131 Endosperm 14DAP ZmPHD27 Zm00001d045755 5 7.16621 pre emergence cob ZmNAC47 Zm00001d008817 5 6.81488 Seedling shoots ZmbHLH6 Zm00001d006022 5 6.60868 Embryo 14DAP ZmGLK16 Zm00001d001936 5 6.41434 Ovule B73 ZmGRAS34 Zm00001d048604 5 6.3976 Leaf Meristems Drought stressed ZmGLK32 Zm00001d046981 5 6.36837 post pollination ZmOrphan318 Zm00001d006553 5 6.23538 pre emergence cob

69

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmHB10 Zm00001d037140 5 6.18012 Embryo 14DAP ZmTCP9 Zm00001d044836 5 6.15987 post emergence cob

ZmWRKY66 Zm00001d012413 5 6.00287 Leaf Meristems ZmHB47 Zm00001d038811 5 5.93879 Ear primordia 1 ZmHB52 Zm00001d008869 5 5.71219 mature silk ZmCPP7 Zm00001d012688 5 5.7047 Ovaries Drought stressed ZmOrphan317 Zm00001d051889 5 5.52816 Developing Leaf ZmMYBR114 Zm00001d004521 5 5.51641 Shoot apex 1 ZmOrphan339 Zm00001d041453 5 5.50432 Seedling shoots ZmE2F4 Zm00001d016737 5 4.98463 pre emergence cob ZmZIM25 Zm00001d037082 5 4.83495 Leaf Meristems Drought stressed ZmPHD6 Zm00001d028863 5 4.74744 pre emergence cob ZmWRKY90 Zm00001d012505 5 4.72077 Ovule B73 ZmFARL8 Zm00001d017164 5 4.57874 Leaf primordia ZmbHLH9 Zm00001d047999 5 4.24041 Ovule B73 ZmHB104 Zm00001d026088 5 4.14939 Leaves 20 days old seedling gc ZmbHLH146 Zm00001d020932 5 4.14187 Mature leaf ZmPHD42 Zm00001d011802 5 3.92766 Endosperm 14DAP ZmJMJ11 Zm00001d012119 5 3.49872 pre emergence cob ZmHB105 Zm00001d033379 5 3.41701 Shoot apex 1 ZmJMJ4 Zm00001d051965 5 3.37037 Endosperm 14DAP ZmGRAS51 Zm00001d003553 5 3.30631 Embryo 14DAP ZmOrphan289 Zm00001d022500 5 3.22602 mature silk ZmWRKY5 Zm00001d044680 5 3.17571 Leaf Meristems Drought stressed ZmOFP12 Zm00001d007168 5 3.15477 pre emergence cob ZmEIL8 Zm00001d016924 5 2.78751 20 days old leaves seedling field ZmSRS6 Zm00001d038081 5 2.69353 Endosperm 14DAP ZmNAC19 Zm00001d016119 5 2.5347 Embryo 14DAP ZmJMJ18 Zm00001d042447 5 2.33226 Embryo 14DAP ZmGRAS72 Zm00001d048602 5 2.28451 Leaf Meristems Drought stressed ZmGLK33 Zm00001d037015 5 2.0624 Seedling shoots ZmbZIP109 Zm00001d023424 5 1.89377 25 DAP endosperm ZmNAC18 Zm00001d021423 5 1.8323 Shoot field1 1 ZmNAC68 Zm00001d045616 5 1.75466 Ovaries Drought stressed ZmGRAS84 Zm00001d048681 5 1.65106 Embryo 14DAP ZmHB118 Zm00001d043231 5 1.58683 Endosperm 14DAP ZmMYB80 Zm00001d034467 5 1.40337 mature silk ZmWRKY112 Zm00001d026218 5 1.3935 5 DAP Seed

70

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmMYB55 Zm00001d028842 5 1.35359 mature silk ZmOrphan30 Zm00001d024909 5 1.33236 25 DAP endosperm ZmCA5P12 Zm00001d005993 5 1.26495 pre emergence cob ZmHB40 Zm00001d042228 5 1.20681 P7 ligule L11 1 ZmOrphan293 Zm00001d004390 5 1.05887 Maize EmbryoSac1 1 ZmbZIP19 Zm00001d043908 5 1.02839 Ovule B73 ZmbZIP29 Zm00001d043908 5 1.02839 Ovule B73 ZmbZIP44 Zm00001d043908 5 1.02839 Ovule B73 ZmbZIP8 Zm00001d043908 5 1.02839 Ovule B73 ZmJMJ6 Zm00001d051959 5 1.00169 P7 ligule L11 1 ZmGLK22 Zm00001d049511 5 0.988772 pre emergence cob ZmbHLH87 Zm00001d020490 5 0.880755 pre emergence cob ZmOrphan256 Zm00001d005366 5 0.818266 Shoot apex 1 ZmOrphan325 Zm00001d005366 5 0.818266 Shoot apex 1 ZmNAC10 Zm00001d017396 5 0.330473 pre pollination tassel ZmALF16 no gene found 4 #N/A #N/A ZmbZIP69 no gene found 4 #N/A #N/A ZmCA2P9 no gene found 4 #N/A #N/A ZmGATA23 no gene found 4 #N/A #N/A ZmMYBR108 no gene found 4 #N/A #N/A ZmNAC52 no gene found 4 #N/A #N/A ZmOrphan128 no gene found 4 #N/A #N/A ZmMYBR48 Zm00001d024547 4 2113.12 Mature leaf ZmMYB150 Zm00001d014364 4 1660.89 B73 pollen ZmOrphan50 Zm00001d038395 4 350.21 B73 pollen ZmCSD3 Zm00001d053908 4 272.348 Embryo 14DAP ZmTHX34 Zm00001d050583 4 157.881 Maize b3 control ZmMYBR40 Zm00001d013777 4 145.614 25 DAP embryo ZmWRKY48 Zm00001d015515 4 121.812 Leaf Meristems Drought stressed ZmC3H45 Zm00001d029010 4 95.7096 Embryo 14DAP

ZmGLK10 Zm00001d014701 4 93.9322 Mature leaf ZmGRAS12 Zm00001d044065 4 93.5627 Embryo 14DAP ZmGRAS46 Zm00001d044065 4 93.5627 Embryo 14DAP ZmSRS4 Zm00001d014762 4 92.3244 Embryo 14DAP ZmNAC60 Zm00001d013003 4 67.2261 Seedling shoots ZmZHD17 Zm00001d051573 4 63.4655 Embryo 14DAP ZmCA5P13 Zm00001d031310 4 63.4279 Embryo 14DAP ZmSRS5 Zm00001d036426 4 61.1177 Embryo 14DAP

71

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmOrphan284 Zm00001d049684 4 60.0287 Embryo 14DAP ZmbZIP75 Zm00001d012296 4 58.3472 Embryo 14DAP ZMTHX21 Zm00001d054080 4 58.1637 Embryo 14DAP ZmSRS9 Zm00001d011843 4 56.0747 Embryo 14DAP ZmOrphan140 Zm00001d011132 4 55.3653 Ovaries Drought stressed ZmOrphan185 Zm00001d005051 4 49.9326 20 days old leaves seedling field ZmBZR2 Zm00001d039439 4 48.7884 pre emergence cob ZmOrphan187 Zm00001d050614 4 48.6616 Whole anthers ZmbHLH91 Zm00001d047017 4 43.5748 B73 pollen ZmOrphan16 Zm00001d018859 4 41.7514 Ovaries Drought stressed ZmTHX22 Zm00001d014938 4 37.8164 Embryo 14DAP ZmTHX12 Zm00001d002801 4 37.5316 Developing Leaf ZmOrphan87 Zm00001d015468 4 36.9527 Leaves 20 days old seedling gc ZmGRAS37 Zm00001d005029 4 35.4377 Embryo 14DAP ZmZHD12 Zm00001d017784 4 33.5814 Embryo 14DAP ZmNLP4 Zm00001d009017 4 33.1372 20 days old leaves seedling field ZmC3H51 Zm00001d033553 4 31.4615 pre emergence cob ZmARF35 Zm00001d014690 4 31.21 5 DAP Seed ZmbHLH121 Zm00001d006065 4 30.5288 Shoot field1 1 ZmARR6 Zm00001d042463 4 29.9698 Seedling shoots ZmMYBR24 Zm00001d008808 4 29.6104 Seedling Roots ZmOrphan21 Zm00001d001953 4 29.5157 mature silk ZmTCP43 Zm00001d007868 4 29.4661 Embryo 14DAP ZmGRAS15 Zm00001d013465 4 27.3951 DAP 10 ZmTHX40 Zm00001d050698 4 25.9021 25 DAP embryo ZmZHD3 Zm00001d031840 4 24.9556 Ear primordia 1 ZmbHLH149 Zm00001d030028 4 24.2003 Leaf Meristems Drought stressed ZmOrphan86 Zm00001d013112 4 24.0974 post emergence cob ZmMYBR63 Zm00001d035470 4 23.6132 Ovaries Drought stressed ZmE2F8 Zm00001d016907 4 22.7506 Ovaries Drought stressed ZmOrphan262 Zm00001d004357 4 22.7412 Endosperm 14DAP ZmGRAS22 Zm00001d026050 4 21.7338 Embryo 14DAP ZmGATA32 Zm00001d010785 4 20.8221 Embryo 14DAP ZmOrphan241 Zm00001d048437 4 20.7048 Ovaries 1 DAP ZmARF20 Zm00001d015243 4 20.5354 mature silk ZmHB9 Zm00001d012751 4 20.0803 pre emergence cob ZmSBP7 Zm00001d052905 4 20.0466 Ovaries Drought stressed ZmTHX18 Zm00001d027335 4 19.4843 Developing Leaf

72

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmZHD16 Zm00001d005931 4 19.4584 Embryo 14DAP ZmOrphan157 Zm00001d033786 4 19.4475 Bundle sheath cell 400 1 ZmCA5P4 Zm00001d034872 4 19.3528 Embryo 14DAP ZmbHLH142 Zm00001d036807 4 18.8769 Developing Leaf ZmHB32 Zm00001d043452 4 18.4609 Ovaries Drought stressed ZmHB89 Zm00001d012419 4 18.4393 Mature leaf ZmFARL15 Zm00001d026485 4 18.1535 Ovaries Drought stressed ZmbHLH140 Zm00001d027419 4 18.0305 Developing Leaf ZmHB18 Zm00001d024701 4 17.0512 Embryo 14DAP ZmPHD34 Zm00001d039579 4 16.5419 Embryo 14DAP ZmOrphan350 Zm00001d053566 4 16.2764 Embryo 14DAP ZmTHX9 Zm00001d006331 4 16.1862 Seedling shoots ZmMYBR59 Zm00001d028380 4 15.6102 Mesophyll cells ZmTHX37 Zm00001d015412 4 15.4973 Embryo 14DAP ZmTHX35 Zm00001d001894 4 15.0369 Shoot apex 1 ZmMYBR103 Zm00001d031454 4 14.2883 Embryo 14DAP ZmbZIP127 Zm00001d031723 4 13.5817 Embryo 14DAP ZmTHX19 Zm00001d013849 4 13.4873 mature silk ZmbZIP42 Zm00001d040500 4 12.7518 Ovaries Drought stressed ZmWRKY69 Zm00001d009595 4 12.221 pre pollination tassel ZmOFP18 Zm00001d043535 4 12.1251 Embryo 14DAP ZmCPP13 Zm00001d041444 4 11.9913 Endosperm 14DAP ZmOrphan38 Zm00001d039639 4 11.9674 Developing Leaf ZmOrphan218 Zm00001d004875 4 11.5986 Leaf Meristems ZmWRKY68 Zm00001d011133 4 11.4437 Leaf Meristems Drought stressed ZmTHX24 Zm00001d016876 4 11.4084 Embryo 14DAP ZmPHD1 Zm00001d035008 4 11.3316 pre emergence cob ZmOrphan102 Zm00001d011195 4 11.1716 post pollination ZmOrphan212 Zm00001d033941 4 10.9236 Developing Leaf ZmNAC129 Zm00001d037315 4 10.7564 pre emergence cob ZmNAC69 Zm00001d037315 4 10.7564 pre emergence cob ZmNAC8 Zm00001d037315 4 10.7564 pre emergence cob ZmWRKY83 Zm00001d038023 4 10.6296 Leaf Meristems ZmSBP22 Zm00001d042319 4 10.2583 Ovaries Drought stressed ZmNAC105 Zm00001d049687 4 9.83951 20 days old leaves seedling field ZmMTERF28 Zm00001d010340 4 9.74165 Seedling Roots ZmTHX41 Zm00001d040682 4 9.54224 Tassel primordia 1 ZmTCP29 Zm00001d012725 4 9.52122 Leaf primordia

73

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmbZIP22 Zm00001d021191 4 9.46241 Endosperm 14DAP ZmARF15 Zm00001d051172 4 9.38076 Embryo 14DAP ZmMYB24 Zm00001d011614 4 8.87381 Ovaries Drought stressed ZmOrphan52 Zm00001d021976 4 8.86931 Ovule B73 ZmDOF46 Zm00001d041549 4 8.57685 Embryo 14DAP ZmABI35 Zm00001d017112 4 8.20723 Ovaries Drought stressed ZmbZIP47 Zm00001d021189 4 7.4219 Ovule B73 ZmOrphan9 Zm00001d010366 4 7.31646 Developing Leaf ZmbHLH138 Zm00001d005740 4 7.19098 Embryo 14DAP ZmOrphan40 Zm00001d009620 4 7.18888 Bundle sheath cell 400 1 ZmbHLH46 Zm00001d039272 4 7.18865 Ovule B73 ZmWRKY8 Zm00001d053369 4 7.06232 Embryo 14DAP ZmOrphan188 Zm00001d012441 4 6.91799 Maize EmbryoSac1 1 ZmMYB49 Zm00001d003412 4 6.62542 Ovule B73 ZmGRAS26 Zm00001d048603 4 6.61512 Seedling shoots ZmWRKY17 Zm00001d022437 4 6.38267 Leaf Meristems Drought stressed ZmSBP14 Zm00001d036692 4 6.35543 P7 ligule L12 1 ZmSBP18 Zm00001d012015 4 6.17527 Ovaries Drought stressed ZmGBP5 Zm00001d013451 4 6.00139 Embryo 14DAP ZmOFP7 Zm00001d002594 4 5.98341 Embryo 14DAP ZmC3H44 Zm00001d041639 4 5.81886 Ovaries Drought stressed ZmTHX14 Zm00001d011571 4 5.77331 Embryo 14DAP ZmPHD28 Zm00001d048969 4 5.67413 pre emergence cob ZmMADS20 Zm00001d018587 4 5.6023 Maize EmbryoSac1 1 ZmbZIP85 Zm00001d006027 4 5.43733 Embryo 14DAP ZmC2H25 Zm00001d028361 4 5.39956 mature silk ZmWRKY93 Zm00001d039245 4 4.9499 Maize b1 1d ZmGLK4 Zm00001d023402 4 4.64114 Maize b1 1d ZmEREB129 Zm00001d016616 4 4.58646 5 DAP Seed ZmMYBR11 Zm00001d017507 4 4.25968 Leaf primordia ZmEREB137 Zm00001d031861 4 4.14296 Embryo 14DAP ZmOrphan322 Zm00001d007050 4 4.12666 25 DAP embryo ZmMYBR39 Zm00001d044118 4 3.95501 pre pollination tassel ZmZIM19 Zm00001d034536 4 3.95224 Mesophyll cells 400 1 ZmPHD21 Zm00001d045629 4 3.91044 mature silk ZmGRAS63 Zm00001d022034 4 3.87753 pre emergence cob ZmOrphan98 Zm00001d040369 4 3.60893 Mesophyll cells ZmOrphan111 Zm00001d041437 4 3.55863 Embryo 14DAP

74

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmZIM17 Zm00001d044707 4 3.27846 Shoot apex 1 ZmTHX27 Zm00001d042917 4 3.19255 Tassel primordia 1 ZmMYB159 Zm00001d020408 4 3.15942 pre emergence cob

ZmWRKY54 Zm00001d042903 4 3.02185 pre pollination tassel ZmZHD7 Zm00001d050443 4 2.96919 Ear primordia 1 ZmNAC80 Zm00001d042246 4 2.93825 Ovaries Drought stressed ZmHB12 Zm00001d048227 4 2.79749 Ovaries Drought stressed ZmHB56 Zm00001d050834 4 2.79433 Maize EmbryoSac1 1 ZmOrphan312 Zm00001d017670 4 2.68976 Shoot apex 1 ZmTHX33 Zm00001d025275 4 2.39519 Ovule B73 ZmOrphan204 Zm00001d033662 4 2.32754 Embryo 14DAP ZmGLK14 Zm00001d039637 4 2.25996 Ovule B73 ZmOrphan349 Zm00001d009085 4 2.23045 Ovaries Drought stressed ZmWRKY39 Zm00001d045375 4 1.88172 Maize EmbryoSac1 1 ZmGRAS73 Zm00001d006744 4 1.78771 Developing Leaf ZmWRKY20 Zm00001d009698 4 1.72815 Leaves 20 days old seedling gc ZmHB81 Zm00001d036510 4 1.55948 20 days old leaves seedling field ZmWRKY75 Zm00001d029564 4 1.54071 Embryo 14DAP ZmNAC107 Zm00001d010144 4 1.5026 B73 pollen ZmMYB140 Zm00001d020569 4 1.48782 Seedling shoots ZmHB93 Zm00001d007525 4 1.39382 Bundle sheath cells ZmOrphan35 Zm00001d040728 4 1.36545 mature silk ZmZHD18 Zm00001d052396 4 1.29681 post emergence cob ZmMYB98 Zm00001d031515 4 1.29323 pre pollination tassel ZmGRAS27 Zm00001d029474 4 1.2798 mature silk ZmOrphan95 Zm00001d003571 4 1.24965 5 DAP Seed ZmWRKY26 Zm00001d043066 4 1.22998 post pollination ZmMYB68 Zm00001d026186 4 1.20503 pre pollination tassel ZmNAC83 Zm00001d009244 4 1.18135 pre pollination tassel ZmZIM22 Zm00001d028313 4 1.16145 Maize EmbryoSac1 1 ZmWRKY99 Zm00001d040554 4 1.13445 Leaf Meristems ZmGRAS25 Zm00001d003657 4 1.11334 B73 pollen ZmTHXL3 Zm00001d047589 4 1.08044 pre emergence cob ZmEREB111 Zm00001d053195 4 0.900053 Seedling shoots ZmJMJ2 Zm00001d051964 4 0.790959 Maize b1 control ZmVOZ3 Zm00001d041158 4 0.656234 Ovaries Drought stressed ZmTCP17 Zm00001d025250 4 0.6485 Maize b1 1d ZmOrphan100 Zm00001d012767 4 0.634135 Mesophyll cells 400 1

75

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmZIM9 Zm00001d035382 4 0.599971 Maize b1 control ZmTUB13 Zm00001d047019 4 0.471189 B73 pollen ZmE2F14 Zm00001d038664 4 0.319135 Ovaries 1 DAP ZmMADS55 Zm00001d041587 4 0.30828 Ovaries Drought stressed ZmE2F15 Zm00001d026355 4 0.277418 Mature leaf ZmE2F10 Zm00001d045365 4 0.263148 DAP 10 ZmNAC14 Zm00001d020446 4 0.261145 mature silk

ZmPHD20 Zm00001d038561 4 0.137384 DAP 10 ZmbHLH75 no gene found 3 #N/A #N/A ZmEREB37 no gene found 3 #N/A #N/A ZmMYB18 no gene found 3 #N/A #N/A ZmMYBR109 no gene found 3 #N/A #N/A ZmGATA34 Zm00001d033945 3 158.844 25 DAP embryo ZmDOF12 Zm00001d013783 3 135.274 B73 pollen ZmEREB18 Zm00001d018081 3 129.357 20 days old leaves seedling field ZmMYB143 Zm00001d030737 3 40.981 Embryo 14DAP ZmBZR7 Zm00001d006677 3 31.8563 Embryo 14DAP ZmWRKY19 Zm00001d008794 3 30.1587 Seedling shoots ZmbHLH167 Zm00001d003677 3 29.7652 Endosperm 14DAP ZmEREB57 Zm00001d052102 3 27.0538 Bundle sheath cells ZmARF26 Zm00001d012731 3 26.8173 pre emergence cob ZmEREB119 Zm00001d031673 3 20.0207 Ovaries Drought stressed ZmMYB116 Zm00001d036577 3 18.1643 B73 pollen

ZmOrphan286 Zm00001d046753 3 14.794 Tassel primordia 1 ZmABI16 Zm00001d017618 3 14.7058 mature silk ZmDBP2 Zm00001d018394 3 14.4547 Seedling shoots ZmOrphan55 Zm00001d025855 3 14.2171 DAP 10 ZmbZIP56 Zm00001d040500 3 12.7518 Ovaries Drought stressed ZmCOL9 Zm00001d051684 3 12.3433 mature silk ZmOrphan217 Zm00001d022590 3 12.281 Bundle sheath cells ZmLOB1 Zm00001d048401 3 10.8051 B73 pollen ZmGATA8 Zm00001d023541 3 8.68801 Endosperm 14DAP ZmGLK2 Zm00001d046174 3 5.43243 mature silk ZmMYB29 Zm00001d047579 3 5.32687 Seedling shoots ZmDOF35 Zm00001d035651 3 5.04037 Leaf primordia

ZmTHX7 Zm00001d028768 3 4.90184 Endosperm 14DAP ZmMYB151 Zm00001d043611 3 4.85039 Ovule B73 ZmGLK54 Zm00001d042892 3 4.75058 Whole anthers

76

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmbZIP28 Zm00001d015846 3 4.56214 Seedling shoots ZmNAC56 Zm00001d020982 3 4.45393 Seedling shoots ZmWRKY98 Zm00001d051328 3 4.44082 pre emergence cob ZmbHLH163 Zm00001d031044 3 4.16311 Embryo 14DAP ZmOrphan58 Zm00001d035846 3 3.75944 Seedling Roots ZmGRAS47 Zm00001d041498 3 3.64202 Whole anthers ZmMYB161 Zm00001d014989 3 3.60831 mature silk ZmCADR14 Zm00001d019101 3 3.55306 Developing Leaf ZmTCP37 Zm00001d005763 3 3.23777 Seedling shoots ZmOrphan132 Zm00001d047384 3 2.77604 pre emergence cob ZmWRKY49 Zm00001d008578 3 2.66206 DAP 10 ZmGLK35 Zm00001d005087 3 2.42134 Ovule B73

ZmTHX36 Zm00001d034953 3 2.39819 pre emergence cob ZmMYBR112 Zm00001d010087 3 2.29201 mature silk ZmE2F2 Zm00001d007384 3 2.23038 Pollen1 1

ZmHB35 Zm00001d002249 3 2.16163 Embryo 14DAP ZmDOF33 Zm00001d012963 3 1.95298 post emergence cob ZmGLK30 Zm00001d048448 3 1.9117 Shoot apex 1 ZmOrphan246 Zm00001d038457 3 1.89761 Leaves 20 days old seedling gc ZmOrphan180 Zm00001d034151 3 1.77674 Ovaries Drought stressed ZmBZR9 Zm00001d027587 3 1.75881 25 DAP endosperm ZmOFP5 Zm00001d029718 3 1.74629 Leaf Meristems Drought stressed ZmNAC85 Zm00001d025483 3 1.57291 pre pollination tassel ZmYAB12 Zm00001d033508 3 1.36755 DAP 10 ZmNAC45 Zm00001d048044 3 1.34083 Mesophyll cells 400 1 ZmNAC77 Zm00001d050870 3 1.32896 Mesophyll cells 400 1 ZmABI9 Zm00001d030908 3 1.16455 pre emergence cob

ZmTHX25 Zm00001d047722 3 1.15842 mature silk ZmMYBR46 Zm00001d042849 3 1.11528 25 DAP endosperm ZmARF38 Zm00001d025871 3 1.09344 mature silk ZmMADS58 Zm00001d038015 3 1.04809 B73 pollen ZmMYBR110 Zm00001d044311 3 1.01735 Ear primordia 1 ZmOrphan209 Zm00001d008563 3 0.948232 mature silk ZmOrphan272 Zm00001d012994 3 0.93538 Shoot apex 1 ZmOrphan231 Zm00001d047807 3 0.919724 Embryo 14DAP ZmMYBR65 Zm00001d026456 3 0.880657 mature silk ZmWRKY41 Zm00001d043063 3 0.786776 Seedling shoots ZmNAC64 Zm00001d028178 3 0.607509 Ovaries Drought stressed

77

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmGLK12 Zm00001d036802 3 0.553246 25 DAP endosperm ZmMADS48 Zm00001d031591 3 0.534461 pre emergence cob ZmTCP13 Zm00001d048591 3 0.434074 Ovaries 1 DAP ZmOrphan116 Zm00001d051874 3 0.41876 pre emergence cob ZmGRAS35 Zm00001d002023 3 0.406028 Ovaries Drought stressed ZmGRAS36 Zm00001d030838 3 0.33333 Developing Leaf ZmOrphan129 Zm00001d044406 3 0.310335 Ovaries 1 DAP ZmTCP22 Zm00001d051154 3 0.294386 mature silk ZmWRKY4 Zm00001d044682 3 0.272659 Leaf Meristems ZmHSF12 Zm00001d046204 3 0.271539 Developing Leaf ZmTCP34 Zm00001d002818 3 0.25543 Tassel primordia 1 ZmTCP42 Zm00001d031597 3 0.184498 Seedling shoots ZmE2F1 Zm00001d032741 3 0.100717 Seedling Roots ZmMYBR105 Zm00001d003496 3 0.093174 25 DAP endosperm ZmMYBR30 Zm00001d019429 3 0 Whole anthers ZmNLP11 Zm00001d053437 3 0 Whole anthers ZmABI15 no gene found 2 #N/A #N/A ZmbHLH18 Zm00001d013370 2 23.6069 Seedling shoots ZmbZIP5 Zm00001d014710 2 15.942 Seedling Roots ZmOrphan13 Zm00001d011062 2 13.0515 25 DAP endosperm ZmEREB67 Zm00001d047339 2 12.189 Ovaries Drought stressed ZmWRKY31 Zm00001d034084 2 9.72489 Mature leaf ZmSBP4 Zm00001d019793 2 9.17498 Bundle sheath cell 400 1 ZmWRKY35 Zm00001d012508 2 6.98204 Seedling shoots ZmbHLH42 Zm00001d032213 2 5.64226 Endosperm 14DAP ZmbHLH78 Zm00001d031167 2 3.09465 Seedling shoots ZmMYBR15 Zm00001d032107 2 2.98164 mature silk ZmZIM5 Zm00001d009438 2 2.90703 Leaf Meristems Drought stressed ZmbZIP32 Zm00001d024285 2 2.49544 Ovaries Drought stressed ZmOrphan17 Zm00001d036803 2 2.2541 Embryo 14DAP ZmHB16 Zm00001d010998 2 1.53132 mature silk ZmHB65 Zm00001d010998 2 1.53132 mature silk

ZmZIM31 Zm00001d016316 2 1.32004 Leaves 20 days old seedling gc ZmLOB32 Zm00001d047752 2 1.23271 20 days old leaves seedling field ZmWRKY18 Zm00001d008190 2 1.18154 post pollination ZmEREB9 Zm00001d037531 2 1.16876 pre pollination tassel ZmMYBR23 Zm00001d036883 2 1.09348 25 DAP endosperm ZmNAC31 Zm00001d038289 2 1.04013 mature silk

78

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmbHLH50 Zm00001d027497 2 0.874058 20 days old leaves seedling field ZmMYB111 Zm00001d005784 2 0.774659 pre pollination tassel ZmCADR4 Zm00001d039957 2 0.771821 Mature leaf ZmMYBR68 Zm00001d025823 2 0.731021 Seedling shoots ZmOrphan321 Zm00001d014963 2 0.596322 Developing Leaf

ZmCA2P2 Zm00001d013856 2 0.566808 Bundle sheath cells ZmOFP40 Zm00001d025534 2 0.560629 Shoot apex 1 ZmMYB114 Zm00001d011739 2 0.526889 Maize b3 1d

ZmCSD2 Zm00001d025647 2 0.524531 Seedling Roots ZmWRKY118 Zm00001d045283 2 0.519121 mature silk ZmGLK19 Zm00001d040630 2 0.456065 Ovule B73 ZmMYB147 Zm00001d010933 2 0.398061 Leaves 20 days old seedling gc ZmGRAS18 Zm00001d035873 2 0.396355 Developing Leaf ZmE2F17 Zm00001d003755 2 0.393131 Mesophyll cells ZmMYB144 Zm00001d051117 2 0.390069 Mature leaf ZmCADR8 Zm00001d012445 2 0.374858 mature silk ZmOrphan329 Zm00001d019519 2 0.306854 mature silk ZmOrphan258 Zm00001d004953 2 0.268216 pre pollination tassel ZmTCP35 Zm00001d048674 2 0.236695 Maize EmbryoSac1 1 ZmTHX28 Zm00001d031391 2 0.21333 B73 pollen ZmPHD24 Zm00001d022583 2 0.19635 Mature leaf ZmMYBR16 Zm00001d040385 2 0 Whole anthers ZmTCP36 Zm00001d025295 2 0 Whole anthers ZmARF37 no gene found 1 #N/A #N/A ZmCADR10 no gene found 1 #N/A #N/A ZmCADR13 no gene found 1 #N/A #N/A ZmCSD1 no gene found 1 #N/A #N/A ZmMTERF20 no gene found 1 #N/A #N/A ZmOrphan232 Zm00001d000139 1 #N/A #N/A ZmZIM11 no gene found 1 #N/A #N/A ZmEREB162 Zm00001d038907 1 19.8848 Ovaries Drought stressed ZmbHLH31 Zm00001d021172 1 18.5722 Embryo 14DAP ZmbHLH98 Zm00001d046346 1 13.0956 mature silk ZmPLATZ14 Zm00001d047250 1 6.98786 Maize EmbryoSac1 1 ZmbHLH64 Zm00001d031260 1 4.68612 Embryo 14DAP ZmGATA10 Zm00001d048391 1 3.59438 Embryo 14DAP ZmEREB174 Zm00001d042717 1 3.55659 Leaf Meristems Drought stressed ZmMYBR25 Zm00001d018320 1 3.10674 P6 Wt1 1

79

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmMYBR36 Zm00001d002482 1 3.01863 Seedling shoots ZmbZIP31 Zm00001d031194 1 2.7965 Leaf Meristems ZmEREB77 Zm00001d043116 1 2.78119 Embryo 14DAP ZmPLATZ1 Zm00001d028594 1 2.19089 Ovaries Drought stressed ZmbHLH59 Zm00001d007536 1 2.13049 DAP 10 ZmEREB186 Zm00001d008872 1 1.69038 Embryo 14DAP ZmLOB5 Zm00001d041205 1 1.58317 Bundle sheath cells ZmEREB169 Zm00001d031498 1 1.48619 Embryo 14DAP ZmbHLH119 Zm00001d016257 1 1.45365 Leaves 20 days old seedling gc ZmNAC73 Zm00001d018773 1 1.38384 Maize b2 1d ZmbHLH15 Zm00001d021988 1 1.28278 B73 pollen ZmZIM8 Zm00001d004277 1 1.17081 Seedling shoots ZmABI11 Zm00001d030908 1 1.16455 pre emergence cob ZmOrphan253 Zm00001d023585 1 1.13472 Leaves 20 days old seedling gc ZmHB44 Zm00001d014629 1 0.931996 Whole anthers ZmOrphan97 Zm00001d048990 1 0.709489 Shoot field2 1 ZmMYB20 Zm00001d039700 1 0.656667 Leaves 20 days old seedling gc ZmbHLH124 Zm00001d037749 1 0.650694 Seedling Roots ZmEREB188 Zm00001d029679 1 0.622322 Maize b2 control ZmLOB35 Zm00001d035498 1 0.500213 Seedling shoots ZmOFP8 Zm00001d003521 1 0.497078 Seedling Roots ZmHB101 Zm00001d008942 1 0.443745 25 DAP embryo ZmOrphan216 Zm00001d003074 1 0.397376 mature silk ZmLOB42 Zm00001d018832 1 0.394934 Whole anthers ZmOrphan101 Zm00001d048989 1 0.314069 Developing Leaf

ZmPHD18 Zm00001d008232 1 0.293283 Seedling Roots ZmOrphan144 Zm00001d049777 1 0.202776 Maize b3 1d ZmHB90 Zm00001d050487 1 0.163277 Embryo 14DAP ZmbHLH110 Zm00001d027753 1 0.15895 Leaves 20 days old seedling gc

ZmTUB12 Zm00001d012373 1 0.145106 20 days old leaves seedling field

ZmOrphan299 Zm00001d028069 1 0.13386 mature silk ZmOrphan324 Zm00001d028069 1 0.13386 mature silk ZmOrphan164 Zm00001d028066 1 0.097082 mature silk ZmABI6 no gene found 0 #N/A #N/A ZmCA5P18 no gene found 0 #N/A #N/A ZmGATA14 no gene found 0 #N/A #N/A ZmHB114 no gene found 0 #N/A #N/A ZmMTERF4 no gene found 0 #N/A #N/A

80

Table A1 cntd: List of Uncloned TF Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmOrphan103 no gene found 0 #N/A #N/A ZmOrphan205 no gene found 0 #N/A #N/A ZmOrphan228 no gene found 0 #N/A #N/A ZmTCP32 no gene found 0 #N/A #N/A ZmZIM6 no gene found 0 #N/A #N/A ZmEREB24 Zm00001d002025 0 53.4892 Embryo 14DAP ZmEREB105 Zm00001d002618 0 20.0555 Leaf Meristems Drought stressed ZmbHLH4 Zm00001d003964 0 15.6105 Embryo 14DAP ZmbHLH89 Zm00001d017804 0 12.8343 Embryo 14DAP ZmEREB153 Zm00001d019096 0 5.40048 B73 pollen ZmEREB84 Zm00001d038216 0 5.17212 Seedling shoots ZmbHLH70 Zm00001d039459 0 3.68108 Whole anthers ZmbHLH77 Zm00001d011212 0 2.82664 Maize b3 1d ZmbZIP55 Zm00001d013172 0 2.80869 25 DAP embryo ZmEREB88 Zm00001d021214 0 2.55054 Seedling shoots ZmABI42 Zm00001d010077 0 2.03092 pre emergence cob ZmEREB21 Zm00001d025409 0 1.36677 Embryo 14DAP ZmEREB78 Zm00001d049903 0 1.3085 25 DAP embryo ZmEREB66 Zm00001d017478 0 1.17087 Embryo 14DAP ZmLOB37 Zm00001d036435 0 1.14798 Developing Leaf ZmbHLH58 Zm00001d041351 0 0.982701 Ovaries Drought stressed ZmEREB28 Zm00001d023535 0 0.959646 mature silk ZmEREB140 Zm00001d050137 0 0.951091 Mature leaf ZmOFP20 Zm00001d044357 0 0.795494 Mesophyll cells ZmEREB43 Zm00001d007236 0 0.745696 25 DAP endosperm ZmbHLH38 Zm00001d048786 0 0.533978 25 DAP endosperm ZmGATA1 Zm00001d005005 0 0.408309 Maize EmbryoSac1 1 ZmEREB157 Zm00001d016848 0 0.394054 Whole anthers ZmbHLH93 Zm00001d033729 0 0.357326 Developing Leaf ZmbHLH28 Zm00001d033413 0 0.209772 Maize EmbryoSac1 1 ZmMYBR12 no gene found #N/A #N/A ZmOrphan236 no gene found #N/A #N/A ZmTHX43 no gene found #N/A #N/A 1 FPKM = Fragments Per Kilobase Million in tissue in which highest expression of this gene was observed 2 Highest Expressed Tissue = maize tissue for which FPKM value is greatest amongst 66 RNA-Seq datasets used in this study. DAP = days after pollination

81

Appendix B

A2. Table A2: List of Uncloned CR Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmTRAF42 Zm00001d053354 5 281.871 B73 pollen ZmSNF2 22 Zm00001d032801 5 184.884 25 DAP endosperm ZmSNF2 1 Zm00001d009312 5 103.339 Ovule B73 ZmRcd1L7 Zm00001d024312 5 97.2494 Mature leaf ZmSNF2 3 Zm00001d040831 5 79.2283 pre emergence cob ZmSNF2 38 Zm00001d045635 5 74.537 Ovaries Drought stressed ZmHAG37 Zm00001d048286 5 67.9893 Mesophyll cells 400 1 ZmSNF2 23 Zm00001d014977 5 65.9293 Ovaries Drought stressed ZmTRAF21 Zm00001d039469 5 63.0006 mature silk ZmSNF2 25 Zm00001d007978 5 49.014 25 DAP embryo ZmSNF2 12 Zm00001d006798 5 47.8059 mature silk ZmMED26 18 Zm00001d007037 5 47.5156 25 DAP endosperm ZmIWS1 Zm00001d034606 5 47.4853 Endosperm 14DAP ZmBAF60 18 Zm00001d010235 5 47.1088 Ovule B73 ZmSNF2 16 Zm00001d002950 5 46.7017 Mesophyll cells ZmTRAF16 Zm00001d013741 5 45.8251 Leaf Meristems ZmMED26 16 Zm00001d047030 5 41.1723 Ovaries Drought stressed ZmMED26 21 Zm00001d047030 5 41.1723 Ovaries Drought stressed ZmRB3 Zm00001d031678 5 39.0623 pre emergence cob ZmTRAF20 Zm00001d012660 5 36.9759 Seedling Roots ZmSNF2 2 Zm00001d047471 5 36.719 Endosperm 14DAP ZmSNF2 10 Zm00001d033827 5 34.8038 Shoot apex 1 ZmIAA42 Zm00001d009532 5 31.6494 Mesophyll cells ZmHAG32 Zm00001d019343 5 31.15 Mesophyll cells ZmSNF2 11 Zm00001d024816 5 30.2666 Ovaries Drought stressed ZmMED26 19 Zm00001d053189 5 29.9905 Whole anthers ZmHAG2 Zm00001d028074 5 29.7896 Developing Leaf ZmSNF2 37 Zm00001d029180 5 29.7027 pre emergence cob ZmSNF2 17 Zm00001d007089 5 27.8003 Endosperm 14DAP ZmTRAF5 Zm00001d046968 5 26.2048 Ovule B73 82

Table A2 cntd: List of Uncloned CR Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmTRAF3 Zm00001d007522 5 25.7332 Seedling shoots ZmRcd1L6 Zm00001d011483 5 24.8644 DAP 10 ZmSNF2 4 Zm00001d022405 5 24.7387 Ovaries Drought stressed ZmTRAF6 Zm00001d042234 5 24.5874 Ovule B73 ZmLIM12 Zm00001d037252 5 23.2639 mature silk ZmHMG12 Zm00001d037615 5 23.2528 pre pollination tassel ZmSNF2 24 Zm00001d039160 5 22.9851 DAP 10 ZmTRAF27 Zm00001d010341 5 22.3432 pre emergence cob ZmSNF2 15 Zm00001d022267 5 22.1331 pre emergence cob ZmBAF60 22 Zm00001d034159 5 21.7606 Ovaries Drought stressed ZmBAF60 24 Zm00001d034159 5 21.7606 Ovaries Drought stressed ZmTRAF33 Zm00001d052837 5 21.4427 Ovaries Drought stressed ZmDDT5 Zm00001d011560 5 20.8473 Ovule B73 ZmLIM5 Zm00001d033297 5 20.8172 Seedling shoots ZmSNF2 39 Zm00001d049605 5 20.6959 pre pollination tassel ZmLUG1 Zm00001d034929 5 20.6338 DAP 10 ZmBAF60 20 Zm00001d043434 5 20.6139 pre emergence cob ZmSNF2 36 Zm00001d049455 5 20.3462 pre emergence cob ZmTRAF19 Zm00001d034400 5 20.0287 Embryo 14DAP ZmBSD8 Zm00001d012665 5 18.9137 5 DAP Seed ZmBAF60 6 Zm00001d007039 5 18.3512 pre emergence cob ZmRB4 Zm00001d052666 5 18.0096 Ovaries Drought stressed ZmBAF60 15 Zm00001d052873 5 17.6402 Embryo 14DAP ZmRB1 Zm00001d007407 5 17.6392 Ovaries Drought stressed ZmSNF2 6 Zm00001d021607 5 17.229 Endosperm 14DAP ZmHAG33 Zm00001d021956 5 15.7623 Ovaries Drought stressed ZmRB2 Zm00001d052695 5 15.1372 25 DAP embryo ZmBAF60 8 Zm00001d013863 5 15.0979 Tassel primordia 1 ZmTAZ4 Zm00001d015209 5 14.9144 Ovaries Drought stressed ZmSNF2 5 Zm00001d050642 5 12.6889 pre emergence cob ZmFHA11 Zm00001d013060 5 12.4946 Ovaries Drought stressed ZmTAZ2 Zm00001d041378 5 12.4838 Ovaries Drought stressed ZmDDT3 Zm00001d016827 5 11.6981 Ovaries Drought stressed ZmDDT6 Zm00001d023391 5 11.5867 pre emergence cob ZmSNF2 14 Zm00001d002656 5 11.2499 Seedling shoots ZmSNF2 29 Zm00001d042080 5 10.56 Embryo 14DAP ZmFHA17 Zm00001d048454 5 10.1134 pre emergence cob ZmSNF2 8 Zm00001d048552 5 9.80315 pre emergence cob

83

Table A2 cntd: List of Uncloned CR Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmSNF2 32 Zm00001d017660 5 9.26942 DAP 10 ZmSNF2 30 Zm00001d022436 5 9.16256 Ovaries Drought stressed ZmBSD1 Zm00001d007479 5 8.76287 Embryo 14DAP ZmTRAF40 Zm00001d037420 5 7.99061 Leaf primordia ZmRcd1L9 Zm00001d048089 5 7.92135 DAP 10 ZmDDT1 Zm00001d004925 5 7.46545 DAP 10 ZmSNF2 35 Zm00001d022046 5 7.40418 Maize EmbryoSac1 1 ZmTRAF26 Zm00001d011923 5 6.97089 Ovule B73 ZmSNF2 19 Zm00001d022313 5 6.7466 Tassel primordia 1 ZmBAF60 1 Zm00001d032471 5 6.68892 Ovaries Drought stressed ZmSNF2 18 Zm00001d013828 5 6.63089 Ovaries Drought stressed ZmSNF2 21 Zm00001d038113 5 5.093 Tassel primordia 1 ZmRcd1L3 Zm00001d012483 5 3.9188 Ovaries Drought stressed ZmSNF2 33 Zm00001d015595 5 1.40287 25 DAP embryo ZmMED26 5 Zm00001d033689 5 0.159533 Mature leaf ZmBAF60 23 Zm00001d004008 4 57.2514 Ovaries Drought stressed ZmFHA5 Zm00001d003513 4 47.3235 Leaves 20 days old seedling gc ZmSNF2 7 Zm00001d050061 4 37.4817 Ear primordia 1 ZmMED26 9 Zm00001d047028 4 24.5582 Embryo 14DAP ZmBSD7 Zm00001d022353 4 24.1603 Ovule B73 ZmIAA35 Zm00001d013154 4 21.5318 Seedling shoots ZmFHA6 Zm00001d004810 4 20.5244 25 DAP embryo ZmBAF60 16 Zm00001d012755 4 16.5402 Tassel primordia 1 ZmDDT4 Zm00001d022417 4 10.0237 DAP 10 ZmHAG35 Zm00001d046823 4 5.33392 Shoot apex 1 ZmSWI3 3 Zm00001d018102 4 3.54891 Embryo 14DAP ZmBAF60 5 Zm00001d037851 4 3.10686 Ovaries Drought stressed ZmIAA43 Zm00001d004697 4 1.68922 Seedling shoots ZmIAA41 Zm00001d024008 4 1.54166 Maize b3 1d ZmBAF60 13 Zm00001d042605 4 1.33779 Seedling Roots ZmBAF60 17 Zm00001d048510 3 15.4353 Seedling shoots ZmBAF60 19 Zm00001d027334 3 13.4045 DAP 10 ZmIAA39 Zm00001d050972 3 12.4502 Embryo 14DAP ZmRcd1L5 Zm00001d046373 3 3.21041 Ovaries Drought stressed ZmHAG31 Zm00001d018677 3 2.9543 Embryo 14DAP ZmBSD5 Zm00001d015822 3 2.03369 B73 pollen ZmIAA36 Zm00001d027495 3 1.86068 Ovaries 1 DAP ZmSNF2 27 Zm00001d011744 3 1.19451 pre pollination tassel

84

Table A2 cntd: List of Uncloned CR Genes and Gene Model Rankings Suggested Model Name V4 Gene Name Ranking FPKM1 Highest Expressed Tissue2 ZmHMG5 Zm00001d051142 3 1.06595 Ovaries Drought stressed ZmTAZ5 Zm00001d030914 3 0.942006 pre pollination tassel ZmIAA40 Zm00001d044818 3 0.807772 Ovaries Drought stressed ZmMED26 22 Zm00001d033694 3 0.672141 Ovaries Drought stressed ZmSNF2 34 Zm00001d034078 3 0.087529 Ovaries Drought stressed ZmRcd1L2 Zm00001d023759 2 29.4152 Bundle sheath cells ZmSNF2 20 Zm00001d012385 2 1.18592 DAP 10 ZmMED26 11 Zm00001d023418 2 0.702199 Mature leaf ZmMED26 20 Zm00001d023406 2 0.335621 Seedling Roots ZmHMG11 Zm00001d036355 1 27.7841 Ovaries Drought stressed ZmRcd1L8 Zm00001d010921 1 0.844537 25 DAP embryo ZmTRAF13 Zm00001d024351 1 0.714506 Mesophyll cells ZmRcd1L10 Zm00001d046678 1 0.376147 25 DAP endosperm ZmMED26 4 Zm00001d023401 1 0.278033 5 DAP Seed ZmTRAF23 Zm00001d024915 1 0.271855 pre emergence cob ZmBAF60 3 none ZmLIM3 none ZmLUG3 none ZmMED26 2 none ZmMED26 3 none ZmMED26 6 none ZmMED26 7 none ZmMED26 8 none ZmTRAF9 none 1 FPKM = Fragments Per Kilobase Million in tissue in which highest expression of this gene was observed 2 Highest Expressed Tissue = maize tissue for which FPKM value is greatest amongst 66 RNA-Seq datasets used in this study. DAP = days after pollination

85

Appendix C

A3. Table A.3: List of Syntelogs of EREBP genes in Maize ZmEREBP V4 name Syntelog 1 Syntelog 2 Syntelog 3 ZmEREB1 ZmEREB44 none none ZmEREB2 ZmEREB195 ZmEREB198 Zm00001d051340 ZmEREB3 ZmEREB36 ZmEREB79 none ZmEREB4 ZmEREB137 none none ZmEREB5 ZmEREB66 Zm00001d026182 ZmEREB135 ZmEREB6 ZmEREB104 none none ZmEREB7 ZmEREB139 none none ZmEREB8 ZmEREB138 none none ZmEREB9 none none none ZmEREB10 ZmEREB127 none ZmEREB188 ZmEREB11 none none none ZmEREB12 ZmEREB34 none none ZmEREB13 ZmEREB136 none none ZmEREB14 none none none ZmEREB15 none none none ZmEREB16 none ZmEREB127 none ZmEREB17 ZmEREB18 ZmEREB109 ZmEREB31 ZmEREB18 ZmEREB17 ZmEREB109 ZmEREB204 ZmEREB19 Zm00001d014565 none none ZmEREB20 none none none ZmEREB21 none none none ZmEREB22 none none none ZmEREB23 ZmEREB65 none none ZmEREB24 none none none ZmEREB25 none none none ZmEREB26 none none none ZmEREB27 Zm00001d044950 ZmEREB136 none ZmEREB28 none none none ZmEREB29 ZmEREB158 none none ZmEREB30 ZmEREB176 Zm00001d026184 ZmEREB6

86

Table A3 contd: List of Syntelogs of EREBP genes in Maize ZmEREBP V4 name Syntelog 1 Syntelog 2 Syntelog 3 ZmEREB31 ZmEREB17 none none ZmEREB32 ZmEREB157 none none ZmEREB33 none none none ZmEREB34 ZmEREB13 none none ZmEREB35 none none none ZmEREB36 ZmEREB3 none none ZmEREB37 none none none ZmEREB38 ZmEREB124 none none Zm00001d036996 ZmEREB39 not an EREB ZmEREB144 none ZmEREB40 none none none ZmEREB41 Zm00001d013170 ZmEREB184 none ZmEREB42 none none none ZmEREB43 none none none ZmEREB44 ZmEREB1 none none ZmEREB45 none none none ZmEREB46 none none none ZmEREB47 ZmEREB156 none none ZmEREB48 ZmEREB38 none none ZmEREB49 ZmEREB162 none none ZmEREB50 none none none ZmEREB51 none none none ZmEREB52 none none none ZmEREB53 none none none ZmEREB54 ZmEREB209 none none ZmEREB55 none none none ZmEREB56 none none none ZmEREB57 Zm00001d018288 none none ZmEREB58 none none none ZmEREB59 none none none ZmEREB60 none none none ZmEREB61 ZmEREB107 ZmEREB2 Zm00001d051340 ZmEREB62 ZmEREB201 none none ZmEREB63 none none none ZmEREB64 Zm00001d005203 none none ZmEREB65 ZmEREB115 none none ZmEREB66 ZmEREB5 Zm00001d026182 none ZmEREB67 none none none ZmEREB68 none none none 87

Table A3 contd: List of Syntelogs of EREBP genes in Maize ZmEREBP V4 name Syntelog 1 Syntelog 2 Syntelog 3 ZmEREB69 none none none ZmEREB70 none none none ZmEREB71 ZmEREB101 none none ZmEREB72 none none none ZmEREB73 none none none ZmEREB74 none none none ZmEREB75 none none none ZmEREB76 ZmEREB170 none none ZmEREB77 none none none ZmEREB78 none none none ZmEREB79 none none none ZmEREB80 ZmEREB208 none none ZmEREB81 Zm00001d010987 none none ZmEREB82 none none none ZmEREB83 ZmEREB147 none none ZmEREB84 ZmEREB63 none none ZmEREB85 none none none ZmEREB86 ZmEREB154 none none ZmEREB87 ZmEREB107 ZmEREB2 none ZmEREB88 ZmEREB65 none none ZmEREB89 ZmEREB137 none none ZmEREB90 none none none ZmEREB91 ZmEREB125 none none Zm00001d038584 ZmEREB92 not an EREB ZmEREB158 none Zm00001d038584 ZmEREB93 not an EREB ZmEREB158 none ZmEREB94 none none none ZmEREB95 none none none ZmEREB96 ZmEREB101 none none ZmEREB97 ZmEREB209 ZmEREB54 none ZmEREB98 none none none ZmEREB99 none none none Zm00001d010593 ZmEREB100 none not an EREB ZmEREB164 ZmEREB101 ZmEREB71 none none ZmEREB102 ZmEREB202 none none ZmEREB103 ZmEREB186 none none ZmEREB104 ZmEREB176 Zm00001d026182 none 88

Table A3 contd: List of Syntelogs of EREBP genes in Maize ZmEREBP V4 name Syntelog 1 Syntelog 2 Syntelog 3 ZmEREB105 ZmEREB115 none none ZmEREB106 none none none ZmEREB107 Zm00001d051340 Zm00001d029940 ZmEREB61 ZmEREB108 none none none ZmEREB109 none ZmEREB204 ZmEREB18 ZmEREB110 ZmEREB81 none none ZmEREB111 none none none ZmEREB112 none none none ZmEREB113 ZmEREB133 none none ZmEREB114 none none none ZmEREB115 ZmEREB65 none none ZmEREB116 none none none ZmEREB117 none none none ZmEREB118 ZmEREB192 none none ZmEREB119 none none none ZmEREB120 none none none ZmEREB121 none none none ZmEREB122 none none none ZmEREB123 none none none ZmEREB124 ZmEREB38 none none ZmEREB125 ZmEREB91 none none ZmEREB126 ZmEREB162 none none ZmEREB127 ZmEREB10 ZmEREB68 none ZmEREB128 none none none ZmEREB129 none none none ZmEREB130 none none none ZmEREB131 none none none ZmEREB132 none none none ZmEREB133 ZmEREB113 ZmEREB174 none Zm00001d012736 ZmEREB134 not an EREB none none Zm00001d045262 ZmEREB135 not an EREB none none ZmEREB136 ZmEREB27 ZmEREB13 none ZmEREB137 none none none ZmEREB138 ZmEREB8 none none ZmEREB139 ZmEREB7 ZmEREB172 ZmEREB211 ZmEREB140 none none none ZmEREB141 none none none 89

Table A3 contd: List of Syntelogs of EREBP genes in Maize ZmEREBP V4 name Syntelog 1 Syntelog 2 Syntelog 3 ZmEREB142 none none none ZmEREB143 none none none ZmEREB144 ZmEREB39 none none ZmEREB145 Zm00001d046501 none none ZmEREB146 ZmEREB165 none none ZmEREB147 ZmEREB83 ZmEREB93 none ZmEREB148 none none none ZmEREB149 ZmEREB131 none none ZmEREB150 none none none ZmEREB151 ZmEREB183 none none ZmEREB152 none none none ZmEREB153 none none none ZmEREB154 ZmEREB170 ZmEREB86 none ZmEREB155 none none none ZmEREB156 ZmEREB47 none none ZmEREB157 ZmEREB32 none none ZmEREB158 ZmEREB29 none none ZmEREB159 none none none ZmEREB160 none none none ZmEREB161 none none none ZmEREB162 Zm00001d011639 ZmEREB49 ZmEREB126 ZmEREB163 none none none ZmEREB164 none none none ZmEREB165 ZmEREB146 none none ZmEREB166 none none none ZmEREB167 none none none ZmEREB168 none none none ZmEREB169 none none none ZmEREB170 ZmEREB76 ZmEREB154 none ZmEREB171 none none none ZmEREB172 ZmEREB211 none ZmEREB139 ZmEREB173 ZmEREB4 none none ZmEREB174 ZmEREB133 none none ZmEREB175 none none none ZmEREB176 ZmEREB104 Zm00001d026182 none ZmEREB177 none none none ZmEREB178 none none none ZmEREB179 none none none ZmEREB180 none none none 90

Table A3 contd: List of Syntelogs of EREBP genes in Maize ZmEREBP V4 name Syntelog 1 Syntelog 2 Syntelog 3 ZmEREB181 none none none ZmEREB182 none none none ZmEREB183 ZmEREB151 none none ZmEREB184 Zm00001d013170 none none ZmEREB185 ZmEREB66 ZmEREB5 Zm00001d026182 ZmEREB186 ZmEREB103 none none ZmEREB187 none none none ZmEREB188 Zm00001d027369 ZmEREB10 ZmEREB36 ZmEREB189 none none none ZmEREB190 none none none ZmEREB191 ZmEREB39 none none ZmEREB192 ZmEREB118 none none ZmEREB193 none none none ZmEREB194 none none none ZmEREB195 Zm00001d051340 none none ZmEREB196 none none none ZmEREB197 ZmEREB212 none none ZmEREB198 ZmEREB195 ZmEREB2 none ZmEREB199 none none none ZmEREB200 none none none ZmEREB201 ZmEREB62 none none ZmEREB202 ZmEREB102 none none ZmEREB203 none none none ZmEREB204 ZmEREB109 none ZmEREB18 ZmEREB205 ZmEREB97 none none ZmEREB206 none none none ZmEREB207 none none none ZmEREB208 ZmEREB80 none none ZmEREB209 ZmEREB54 none none ZmEREB210 none none none ZmEREB211 ZmEREB172 none ZmEREB139 ZmEREB212 ZmEREB197 none none Zm00001d014565 ZmEREB19 none none Zm00001d001907 none none none Zm00001d044950 ZmEREB27 ZmEREB136 none Zm00001d005309 none none none Zm00001d013170 ZmEREB41 ZmEREB184 none Zm00001d018288 ZmEREB57 none none Zm00001d005203 ZmEREB64 none none 91

Table A3 contd: List of Syntelogs of EREBP genes in Maize ZmEREBP V4 name Syntelog 1 Syntelog 2 Syntelog 3 Zm00001d010987 ZmEREB81 none none Zm00001d008665 none none none Zm00001d051340 ZmEREB170 Zm00001d029940 ZmEREB61 Zm00001d023662 none none none Zm00001d025477 none none none Zm00001d046501 ZmEREB145 none none Zm00001d026182 ZmEREB176 ZmEREB104 none Zm00001d026184 ZmEREB30 ZmEREB176 ZmEREB6 Zm00001d052732 none none none Zm00001d027686 none none none Zm00001d039907 none none none Zm00001d029940 ZmEREB107 Zm00001d051340 ZmEREB61 Zm00001d030631 none none none Zm00001d031500 none none none Zm00001d043409 none none none Zm00001d028069 none none none Zm00001d011639 ZmEREB162 ZmEREB126 ZmEREB49

92

Appendix D

A4. Table A.4: List of EREB Genes with likely deletions of syntelog

ZmEREBP gene Suggested Name ZmEREBP gene Genome V4 Name ZmEREB16 Zm00001d017592 ZmEREB35 Zm00001d028070 ZmEREB63 Zm00001d042593 ZmEREB67 Zm00001d047339 ZmEREB75 Zm00001d022153 ZmEREB95 Zm00001d027687 ZmEREB100 Zm00001d038446 ZmEREB109 Zm00001d020267 ZmEREB120 Zm00001d028066 ZmEREB123 Zm00001d028068 ZmEREB130 Zm00001d027878 ZmEREB132 Zm00001d050948 ZmEREB148 Zm00001d009103 ZmEREB152 Zm00001d046913 ZmEREB153 Zm00001d019096 ZmEREB155 Zm00001d052365 ZmEREB159 Zm00001d034605 ZmEREB161 Zm00001d048004 ZmEREB166 Zm00001d019098 ZmEREB194 Zm00001d050948 ZmEREB203 Zm00001d046651 not yet named Zm00001d001907 not yet named Zm00001d005309 not yet named Zm00001d008665 not yet named Zm00001d023662 not yet named Zm00001d025477 not yet named Zm00001d027686 not yet named Zm00001d028069 not yet named Zm00001d030631 93

not yet named Zm00001d031500 not yet named Zm00001d039907 not yet named Zm00001d043409 not yet named Zm00001d052732

94

Appendix E

A5. CIRCOS Ribbon File

Ribbon file link7 Zm1 1512739 71156142 link7 Zm9 120958204 158434932 link11 Zm2 3960741 31859308 link11 Zm4 137672274 178754978 link12 Zm2 176557898 225332235 link12 Zm7 75832084 180849739 link14 Zm2 475655 91200559 link14 Zm10 103053341 149592885 link15 Zm3 144730719 203799797 link15 Zm8 161298708 180937799 link16 Zm3 556690 50767878 link16 Zm8 331984 30792440 link17 Zm3 204187310 229575256 link17 Zm8 142040518 161006758 link20 Zm4 137672274 178754978 link20 Zm2 3960741 31859308 link22 Zm4 96152230 175154893 link22 Zm5 163949244 222952781 link23 Zm4 175534354 186715273 link23 Zm5 207420135 221229776 link24 Zm4 238155357 246610977 link24 Zm5 67944117 107588075 link27 Zm5 190229506 215075596 link27 Zm2 8973698 31358799 link28 Zm5 163949244 222952781 link28 Zm4 96152230 175154893 link29 Zm5 67944117 107588075 link29 Zm4 238155357 246610977 link32 Zm6 85217578 94242014 link32 Zm5 42443580 66254151 link34 Zm6 25567692 46917536

95

link34 Zm8 134763584 139414743 link42 Zm7 19095363 23597808 link42 Zm2 162544641 165866469 link43 Zm8 161298708 180937799 link43 Zm3 144730719 203799797 link44 Zm8 331984 30792440 link44 Zm3 556690 50767878 link45 Zm8 142040518 161006758 link45 Zm3 204187310 229575256 link37 Zm6 110850147 122121222 link37 Zm9 62041 12045082

96

Appendix F

A6. CIRCOS Gene coordinates file

Gene coordinates file Zm4 183995749 183996699 EREB17 Zm5 213912954 213913979 EREB18 Zm10 140619551 140620363 EREB2 Zm2 21962709 21963548 EREB198 Zm2 10906553 10907221 EREB97 Zm10 142611393 142612061 EREB205 Zm8 176892933 176893658 EREB83 Zm3 191580135 191580848 EREB147 Zm4 239645670 239646296 EREB8 Zm9 5518177 5519226 EREB31 Zm7 22130022 22130944 EREB64 Zm8 12513040 12513613 EREB118 Zm3 21877778 21879721 EREB192 Zm6 91341686 91342279 EREB19 Zm2 5505054 5505812 EREB47 Zm10 145983350 145983982 EREB156 Zm5 193684947 193686296 EREB76 Zm4 149655139 149656476 EREB170 Zm6 105660981 105662401 EREB145 Zm3 216213660 216214559 EREB208 Zm2 213547650 213548585 EREB62 Zm8 136011962 136013101 EREB110 Zm5 216112775 216113527 EREB136 Zm9 143638759 143639535 EREB173 Zm5 102583546 102584181 EREB138 Zm8 152569157 152570068 EREB80 Zm7 165528436 165529362 EREB201 Zm6 30481710 30486777 EREB81 Zm4 181353441 181354229 EREB13 Zm3 210069089 210070258 EREB126 Zm1 38080422 38081126 EREB4

97

Zm2 163456179 163457054 Zm00001d005203 Zm5 53599524 53600105 Zm00001d014565 Zm8 157135997 157137214 Zm00001d011639

98

Appendix G

A7. CIRCOS Ideogram file

Ideogram.conf Show =yes default = 0.01r break = 0.04r axis_break_at_edge = yes axis_break = yes axis_break_style = 2 condition = var(label) eq "EREB4" label_font = bold label_color = red label_size = 24p radius = 0.57r thickness = 20p fill = yes stroke_color = dgrey stroke_thickness = 2p show_label = yes label_font = default label_radius = dims(image,radius) - 70p label_with_tag = yes label_size = 40 label_parallel = yes label_case = lower label_format = eval(sprintf("chr%s",var(label)))

99

Appendix H

A8. CIRCOS configuration file

Circos.conf karyotype = karyotype.zeamaysV4.txt <> <> <> <> file = ribbon.EREBlinkscorr2.txt radius = 0.99r crest = 1 ribbon = yes flat = yes #stroke_color = blue #stroke_thickness = 2 color = black bezier_radius = 0.1r thickness = 1 condition = 1 color = eval(sprintf("chr%s_a3",substr(var(chr2),2))) z = eval(remap_int(var(size2),0,10e6,0,-100)) type = text #color = black file = text.ZmEREBgenescorr2.txt r1 = 1r+200p 100

r0 =1r label_font = default label_size = 12p show_links = yes link_dims = 0p,0p,70p,0p,10p link_thickness = 2p link_color = grey <> radius* = 750p auto_alpha_steps* = 10

101

Appendix I

A9. R code for Coexpression analysis

R code for Coexpression analysis setwd("Desktop") library(RColorBrewer) library(gplots) library(Hmisc) library(corrplot) library(reshape2)

#read data maize_data <- read.table("EREBTarget.txt", h=T) log2_maize_data <- log2(maize_data+1) mat_log2_maize_data <- as.matrix(log2_maize_data)

# Maize calculated distances dist_row = dist(mat_log2_maize_data, method = "euclidian") dist_col = dist(t(mat_log2_maize_data), method = "euclidian") # Maize clustering cluster_row = hclust(dist_row, method = "average") cluster_col = hclust(dist_col, method = "single")

# Heatmap my_palette <- colorRampPalette(c('black', 'dodgerblue3', 'gold2', 'red'))(299) colors =c(seq(0.0, 2, length=100), seq(2, 5, length=100), seq(5, 7, length=100), seq(7, 10, length=100))

# heatmap Maize # hm_all <- heatmap.2(mat_log2_maize_data, margins =c(15,15), col = my_palette, 102

#breaks=colors, density.info = "none", trace= "none", scale="none", Rowv = as.dendrogram(cluster_row), Colv = as.dendrogram(cluster_col), main=" QTeller Expression Data")

## calculated correlation matrix # Maize cor_matrix <-rcorr(t(mat_log2_maize_data), type=c("spearman"))

## to restructure data matrix melt_cor_matrix <-melt(cor_matrix$r) #co-expression plot maize cor_matrix$r[is.na(cor_matrix$r)] <- 0 # to change NA values in maize matrix corrplot(cor_matrix$r, method="square", order="hclust", addgrid.col="white", tl.col="black", tl.srt=45, mar = c(0,0,1,0))

# ++++++++++++++++++++++++++++ # flattenCorrMatrix # ++++++++++++++++++++++++++++ # cormat : matrix of the correlation coefficients # pmat : matrix of the correlation p-values flattenCorrMatrix <- function(cormat, pmat) { ut <- upper.tri(cormat) data.frame( row = rownames(cormat)[row(cormat)[ut]], column = rownames(cormat)[col(cormat)[ut]], cor =(cormat)[ut], p = pmat[ut] ) }

#writes table for P-values table_qtellerpt1half_maize <- flattenCorrMatrix(cor_matrix$r, cor_matrix$P) write.table(table_qtellerpt1half_maize, "tabla_leaf_maize.csv", sep ="\t", quote = FALSE)

103