Abstract

COKER, JEFFREY SCOTT. The systemic response to fire damage in tomato : A case study in the development of methods for gene expression analysis using sequence data. (Under the direction of Dr. Eric Davies.)

Fire is a natural component of most terrestrial ecosystems and can act as a local wound stimulus to plants. The ultimate goal of this work was to characterize the array of

transcripts which systemically accumulate in plants after fire damage. Before this could be

accomplished, substantial development of methods for gene expression analysis using sequence data was necessary. This involved developing methods for identifying

contamination in DNA sequence data (Chapter 2), identifying over 78,000 false sequences in

GenBank and several thousand more in the indica rice genome (Chapter 2), developing a

novel method for identifying housekeeping controls using sequence data (Chapter 3),

performing relative expression analyses for 127 potential housekeeping control transcripts

(Chapter 3), and characterizing 23 transcripts which encode all 13 subunits of vacuolar H+-

ATPases in tomato plants (Chapter 4). A subtractive cDNA library served as a starting point to identify and characterize 9 novel tomato transcripts systemically up-regulated in leaves in the first hour after a distant leaf is flame wounded (Chapters 5). Real-time RT-PCR using leaf RNA isolated at different times after flaming showed that the most common pattern of transcript accumulation was an increase within 30 to 60 minutes, followed by a return to basal levels within 3 hours. Expression analyses also showed that most up-regulated transcripts were already present in unwounded tissues. A total of 46 different transcripts

were identified from the subtractive cDNA library (Chapters 6). Compared with the entire

tomato transcriptome, these 46 transcripts are very highly conserved in plants. The vast majority fell into 5 classes: enzymes of general metabolism; protein synthesis, modification,

and transport; transcription; membrane transport; and photosynthesis and respiration. At

least half of the transcripts have been previously associated with wounding or stress,

suggesting that the systemic response to fire damage has components similar to those of other

wound and stress responses. On the other hand, 30% of transcripts were associated with

photosynthesis and respiration, suggesting that part of the response to fire damage is notably different from other wound and stress responses. Conclusions and future directions are included in Chapter 7.

THE SYSTEMIC RESPONSE TO FIRE DAMAGE IN TOMATO PLANTS: A CASE STUDY IN THE DEVELOPMENT OF METHODS FOR GENE EXPRESSION ANALYSIS USING SEQUENCE DATA

by JEFFREY SCOTT COKER

A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy

DEPARTMENT OF BOTANY

Raleigh 2004

APPROVED BY:

______

Dr. Judy Thomas Dr. Jack Wheatley Advisory committee member Advisory committee member

______

Dr. Dominique Robertson Dr. Chris Brown Advisory committee member Advisory committee member

______

Dr. Eric Davies Chair of Advisory Committee

Dedication

The dissertation of Jeffrey S. Coker, which completes the Degree of Doctor of Philosophy, is dedicated to the educators of Plymouth, North Carolina.

Leafie Bryant Carolyn Watlington Julia Towe Joyce Hardison Rita Rhodes Donna Whitfield Frances Callander A. Willingham Ann Bland Robert Moore Doris Downing Kevin Cutler Ruth Pharr Leroy Bland Beth Thompson Kathy Stanfield Shirley Thomas Kerry Koeppl Sally Woolard Donald Rote Glenda Smith Pam Benson Bea Waters Becky Brown Judy Wynn Robert Cody Ms. Wilkins Alma Phifer Senya Norman Frances Jones Roxanna Brown Marian Floyd Judy Bragg Ed Clark Mary Kay Bradshaw Geraldine Rodgers Janet Swain Charlene Evans Joyce Hardison Susan Owens Dianne Staten Hector Palacios Donald Hassell Susie Jakeman Louis Spencer Victor Davis Marty Alligood Patrick Parr Mr. and Mrs. Sermons Michelle Stewart Julius Walker Robert Cody

ii Biography

Jeffrey Scott Coker was born the son of Jerry and Debra Coker in the small town of

Plymouth, North Carolina. His interest in plants is probably due to a family of gardeners, pulp and paper engineers, and wood-workers, as well as a community where farms, forests, ball fields, and swamps are plentiful. Jeffrey attended Davidson College, where he studied biology and ancient Greek and Roman civilizations, and played baseball. After graduation, he worked for one year at the Helen Paesler School in Raleigh, NC, teaching high school biology, chemistry, and calculus, as well as middle school science/math. It was during this year that he found a passion for teaching science and decided to pursue it at the college level.

Jeffrey entered graduate school at N.C. State University in 1999 as an RA/TA in the

Botany Department, where he taught laboratories in Botany and Biotechnology, and co- taught a new Whole Physiology course. He earned a M.Ed. in Science Education in the spring of 2001, and formally became a Ph.D. student in Botany (under Dr. Eric Davies) shortly thereafter. He has been recognized for his teaching at N.C. State by receiving the

CALS Outstanding Teaching Assistant Award, the Martha Sue Sebastian Memorial Award for Excellence in Teaching, a GSA Outstanding Teaching Award, an Alcoa Teaching

Fellowship, and a NACTA Graduate Student Teaching Award. Student researchers under his supervision have been recognized locally and nationally for their work.

While in Raleigh, Jeffrey met a wonderful girl named Beth, and they were married on

December 20, 2003, in Greenville, N.C. Beginning in August of 2004, Jeffrey will be an

Assistant Professor in the Biology Department at Elon University. He looks forward to a successful career in teaching and research, and to spending many happy years with Beth.

iii Acknowledgements

There are many people who have supported me over the last five years in various capacities. My committee members have been extremely supportive, and for that I am most grateful. Dr. Eric Davies has been an outstanding research advisor in every sense. His openness to new ideas, support of my work, willingness to integrate scientific and educational pursuits, careful review of manuscripts, daily friendliness, and general guidance have all been invaluable. Perhaps the most distinct impression Eric has left on me is the amount of effort he spends helping to advance the lives and careers of his students and colleagues. I cannot think of a more admirable quality. We have had many conversations about how many students do not fully appreciate a teacher or mentor until years later. Let me assure you that I am fully aware of what an outstanding advisor I have had. Dr. Judy Thomas has been an excellent mentor and friend, and was an instrumental part of my success in graduate school. She believed in me when others were skeptical, and set me on the right path more times than I can count. Dr. Chris Brown has been a role model for me in terms of professionalism, teaching, and the leadership of research and teaching collaborations. He introduced me to concepts of Space Biology which changed the way I look at my own discipline. I credit Dr. Niki Robertson with shaping my earliest thoughts about biotechnology, and value her thoughts very highly. Her enthusiastic and insightful approaches to science and life are contagious among her students. Dr. Jack Wheatley’s presence on my committee is especially meaningful because he represents good teaching and educational scholarship. I am thankful for his guidance, patience, and insightful reviews of my teaching.

iv A number of people worked alongside me in the laboratory, and provided daily assistance for which I am thankful. Dr. Raul Salinas was especially helpful and patient.

Most of my “co-workers” were high school and undergraduate student researchers who always made the lab a more enjoyable place. In particular, I am thankful to have worked with Derek Jones regarding vacuolar ATPases and enjoyed both his enthusiasm and friendship. Other student researchers included Katie Grant, Jessica Staley, Holly Cline, Ryan

Parks, Ashwynn Stanger, John Pollard, and Turqouise Ross.

Dr. Gerald Van Dyke has been an invaluable teaching mentor and friend. His excitement about teaching and commitment to students have inspired me to seek excellence in the classroom. My time at N.C. State would not have been the same without the friendship and conversation of Dr. Isaac Bruck. I am also thankful for the Botany administrative staff, especially Sue Vitello and Vicki Lemaster, who dealt with many issues on my behalf.

I am blessed with a loving family which has provided support in many forms. Mom,

Dad, Grandmother, Chris, Eric, Laura, Sheila, Mike, Josh, and Debbie have all played important roles in my life. On at least two occasions, family members (Eric and Mom) helped me to overcome significant research difficulties.

Finally, I could not dream of having a more supportive wife. Beth has been at my side through virtually every step of my dissertation research. She has assisted me in the field, in the laboratory, and in the classroom. She has read my papers, inspected tables and figures, listened to whole lectures just so I could practice and, perhaps most importantly, encouraged me to work long hours when deadlines approached or I became really excited about something (which happens frequently). She must be, as we joke, the “best chemical engineering botanist” in the country. Any success I have must also be hers.

v Part of the research and travel associated with this dissertation was funded by grants from the Plant Molecular Biology Consortium, Sigma Xi, and the American Society of Plant

Biologists. Acknowledgements of a more technical nature are provided at the end of each chapter.

vi Table of Contents

List of Tables ...... xi

List of Figures...... xiv

1. Introduction...... 1

2. Sequence quality control ...... 6

A. Identifying adaptor contamination when mining DNA sequence data...... 7 Abstract...... 7 Acknowledgments...... 11 References...... 12

B. Cleaning data mined from the indica rice genome...... 16 Abstract...... 16 SmaI-linearized pUC18 plasmid...... 16 Regions of other cloning vector(s)...... 18 Phytophthora...... 19 Conclusions...... 20 References...... 21

C. Correction of the 5’ end of the human com1/p8 gene...... 26 Letter...... 26 References...... 26

3. Selection of candidate housekeeping controls in tomato plants using EST data ...... 28

Abstract...... 29 Introduction...... 29 Materials and methods ...... 30 Data mining...... 30 Calculation of relative expression levels ...... 30 Calculation of fold ranges and transcript variation...... 30 Results and discussion ...... 31 Acknowledgements...... 33 References...... 33

4. Identification, conservation, and relative expression of V-ATPase cDNAs in tomato plants...... 34

Abstract...... 35 Introduction...... 35 Materials and methods ...... 37 vii Identification of V-ATPase ESTs ...... 37 Relative expression analyses...... 37 Gene nomenclature ...... 37 Results and discussion ...... 40 23 V-ATPase genes identified in tomato...... 40 Hexamer rings are highly conserved...... 40 Relative expression levels in different tissues ...... 41 V-ATPase relative expression increases during fruit ripening ...... 45 Conclusion ...... 46 Acknowledgements...... 46 References...... 47

5. Identification, accumulation, and functional prediction of novel tomato transcripts systemically up-regulated after fire damage ...... 49

Abstract...... 50 Introduction...... 51 Results...... 55 Discussion...... 59 CSWR-1 Acyl carrier protein ...... 60 CSWR-2 Adenylyl-sulfate reductase...... 60 CSWR-3 Unknown protein...... 61 CSWR-4 Photosystem II oxygen-evolving complex protein 3...... 61 CSWR-5 Putative anion:sodium symporter...... 61 CSWR-6 Unknown wound/stress protein...... 62 CSWR-7 Chloroplast-specific ribosomal protein ...... 63 CSWR-8 Alpha/beta fold family protein ...... 63 CSWR-9 Histidine triad family protein ...... 63 Materials and Methods...... 65 Plant material, growth conditions, and tissue collection...... 65 Subtractive cDNA library construction, screening and sequencing ...... 65 DNA sequence analysis and data mining...... 66 Verification of consensus sequences ...... 67 Real-time RT-PCR assays...... 67 Relative expression analyses...... 68 Polypeptide sequence analysis...... 69 Acknowledgements...... 69 Literature cited...... 70

6. Fire damage causes the systemic up-regulation of a set of highly conserved transcripts in tomato plants ...... 84

Abstract...... 85 Introduction...... 86 Materials and methods ...... 89 Plant material, growth conditions, and tissue collection...... 89

viii Subtractive cDNA library construction, screening and sequencing ...... 89 DNA sequence analysis ...... 90 Comparisons with the Arabidopsis genome ...... 90 Results...... 92 Overview of the subtractive cDNA library...... 92 Library validation...... 94 Conservation between tomato and Arabidopsis...... 95 Discussion...... 97 Transcripts common to other wound and stress responses ...... 97 Transcripts not common to other wound and stress responses...... 101 References...... 103

7. Conclusions and future directions...... 108

Conclusions and future directions regarding the development of methods for gene expression analysis using sequence data: Blueprint for a universal sequencing-based method of gene expression analysis...... 109 Abstract...... 109 Disadvantages of binding-radiation methods...... 110 Advantages of sequencing methods...... 112 Obstacles and specifications for a universal sequencing-based method...... 117 References...... 120 Conclusions and future directions regarding the biology of systemic responses to fire damage ...... 121

Appendices...... 124

Appendix 1: V-ATPase amino acid alignments...... 125

Appendix 2: Annotated sequences for novel tomato transcripts/proteins ...... 141

Appendix 3: Perspectives on student research experiences in plant biology...... 152

Overview...... 152

A. Involvement of plant biologists in undergraduate and high school student research ...... 153 Abstract...... 153 Introduction...... 153 Methods...... 153 Member participation...... 153 Advantages and disadvantages of research training ...... 154 References...... 156

B. A national perspective on mentoring student researchers in plant biology...... 157 Abstract...... 157

ix Introduction...... 158 Materials and methods ...... 161 Results and discussion ...... 164 Acknowledgements...... 176 References...... 177

C. Evaluation of teaching and research experiences undertaken by botany majors at N.C. State University...... 185 Abstract...... 185 Introduction...... 186 Methods...... 188 Results and discussion ...... 188 Acknowledgements...... 195 Literature cited...... 195

x List of Tables

Chapter 2-A

Table 1. Sequences and search parameters to identify entries in GenBank contaminated by 7 commercial adaptor sequences ...... 14

Chapter 2-B

Table 1. Matches in the indica genome with the pUC18 SmaI site...... 22

Table 2. Examples of internal pUC18 artifacts (≥14 bp) in indica scaffolds ...... 24

Table 3. Examples of phytophthora-like sequences in the indica genome...... 25

Chapter 3

Table 1. Summary of tentative consensus sequences (TCs) from the TIGR TGI that were analyzed for their potential as housekeeping control genes...... 30

Table 2. Highest-ranking housekeeping control genes in various tomato plant tissues ...... 31

Chapter 4

Table 1. V-ATPase genes in Arabidopsis and tomato ...... 38

Chapter 5

Table 1. Sequence extension and polypeptide deduction for unidentifiable tomato cDNA fragments that are "candidates for the systemic wound response" (CSWR) ...... 76

Table 2. PCR primers specific to 9 novel tomato cDNAs that were used to verify putative open reading frame sequences and perform real-time RT-PCR experiments...... 77

Chapter 6

Table 1. Summary of a subtractive cDNA library containing transcripts systemically up- regulated in the hour after fire damage...... 93

Chapter 7

Table 1. Specifications for a universal sequencing-based method of gene expression analysis...... 118

xi Appendix 1

Table 1. Subunit c amino acid identities...... 127

Table 2. Subunit c” amino acid identities ...... 128

Table 3. Subunit d amino acid identities...... 129

Table 4. Subunit e amino acid identities...... 130

Table 5. Subunit A amino acid identities...... 132

Table 6. Subunit B amino acid identities...... 134

Table 7. Subunit C amino acid identities...... 135

Table 8. Subunit D amino acid identities...... 135

Table 9. Subunit E amino acid identities ...... 137

Table 10. Subunit F amino acid identities ...... 138

Table 11. Subunit G amino acid identities...... 139

Table 12. Subunit H amino acid identities...... 140

Appendix 3-A

Table 1. ASPB member involvement and satisfaction with supporting undergraduate and high school research...... 154

Table 2. Frequencies of ASPB member comments regarding the potential advantages of supporting undergraduate (UG) and high school (HS) research...... 154

Table 3. Frequencies of ASPB member comments regarding the potential disadvantages of supporting undergraduate (UG) and high school (HS) research...... 155

Appendix 3-B

Table 1. Population demographics of respondents to a survey of the American Society of Plant Biologists (ASPB) ...... 180

Table 2. Percentages of respondents who have mentored various numbers of undergraduates ...... 181

xii Table 3. Respondent perceptions of institutional incentives for mentoring student researchers ...... 182

xiii List of Figures

Chapter 1

Figure 1. Strategy to identify and analyze cDNAs up-regulated in tomato leaf tissue during a systemic wound response to fire damage...... 5

Chapter 2-A

Figure 1. The path from sequencing a cDNA to an improperly edited sequence...... 15

Chapter 2-B

Figure 1. Matches of 20 bp, 19 bp, 18 bp, etc. in the indica genome corresponding to the pUC18 SmaI site...... 23

Chapter 3

Figure 1. Percentage of tomato cDNA libraries (n = 27) which contain ESTs for given genes within various fold ranges of relative expression ...... 32

Chapter 4

Figure 1. Amino acid identity of tomato V-ATPase subunits compared to Arabidopsis ...... 42

Figure 2. Relative expression levels of V-ATPase ESTs in different cDNA libraries of the TIGR TGI...... 43

Figure 3. Relative expression levels of individual V-ATPase cDNAs...... 44

Figure 4. Cumulative relative expression levels of tomato V-ATPase subunits ...... 44

Figure 5. Similarity between V-ATPase relative expression in developing tomatoes and V- ATPase activity in developing grapes (grape data from Terrier et al., 2001)...... 46

Chapter 5

Figure 1. Strategy to identify and characterize cDNAs up-regulated in tomato leaf tissue during a systemic wound response to fire damage ...... 78

Figure 2. Confirmation of the existence of 9 putative consensus sequences for unknown tomato cDNAs ...... 79

Figure 3. Expressed sequence tag analysis of 9 cDNAs that are candidates for the systemic wound response (CSWR)...... 80

xiv

Figure 4. Organ-specific relative abundance of CSWR-1 through CSWR-9 in unwounded tomato plants...... 81

Figure 5. Systemic transcript accumulation of 9 tomato cDNAs (CSWR-1 through CSWR-9) in leaf 4 after flame wounding leaf 3...... 82

Figure 6. Structural and functional prediction of 9 tomato proteins, encoded by CSWR-1 through CSWR-9 ...... 83

Chapter 6

Figure 1. Conservation of transcript sequences between tomato and Arabidopsis...... 95

Figure 2. Phenylpropanoid biosynthesis from phenylalanine...... 98

Figure 3. The methyl cycle and ethylene synthesis ...... 99

Chapter 7

Figure 1. Comparisons that can be made between 2 transcript populations using binding- radiation (a) and sequencing (b) methods...... 114

Figure 2. Theoretical blueprint for a universal sequencing-based method of gene expression analysis...... 119

Appendix 1

Figure 1. Alignment of c subunits in tomato ...... 125

Figure 2. Alignment of c subunits in tomato and Arabidopsis ...... 126

Figure 3. Alignment of c” subunits in tomato...... 127

Figure 4. Alignment of c” subunits in tomato and Arabidopsis ...... 128

Figure 5. Alignment of d subunits in tomato and Arabidopsis...... 129

Figure 6. Alignment of e subunits in tomato ...... 130

Figure 7. Alignment of e subunits in tomato and Arabidopsis ...... 130

Figure 8. Alignment of A subunits in tomato and Arabidopsis ...... 131

Figure 9. Alignment of B subunits in tomato ...... 132

xv Figure 10. Alignment of B subunits in tomato and Arabidopsis ...... 133

Figure 11. Alignment of C subunits in tomato and Arabidopsis ...... 134

Figure 12. Alignment of D subunits in tomato and Arabidopsis...... 135

Figure 13. Alignment of E subunits in tomato...... 136

Figure 14. Alignment of E subunits in tomato and Arabidopsis...... 136

Figure 15. Alignment of F subunits in tomato and Arabidopsis...... 137

Figure 16. Alignment of G subunits in tomato ...... 138

Figure 17. Alignment of G subunits in tomato and Arabidopsis...... 138

Figure 18. Alignment of H subunits in tomato and Arabidopsis...... 139

Appendix 3-A

Figure 1. ASPB member comments regarding potential advantages of supporting undergraduate researchers...... 155

Figure 2. ASPB member comments regarding potential advantages of supporting high school researchers...... 155

Figure 3. Number of ASPB member comments regarding undergraduate and high school research ...... 155

Appendix 3-B

Figure 1. Percentages of plant biologists who mentored various numbers of undergraduates in different “length of their mentoring career” categories ...... 183

Figure 2. Total number of undergraduates mentored by plant biologists of different academic ranks at land-grant universities, other research universities, and primarily undergraduate institutions (PUIs) ...... 184

Figure 3. Percentages of plant biologists of different academic rank at land-grant universities, other research universities, and primarily undergraduate institutions (PUIs) who perceive institutional incentives for mentoring undergraduate researchers ...... 184

Appendix 3-C

Figure 1. Average levels of student involvement in typical teaching-related activities ...... 198

xvi Figure 2. Average levels of student involvement in typical research-related activities ...... 198

Figure 3. Student perceptions of their research and/or teaching experience ...... 199

xvii

Chapter 1

Introduction

1 The ultimate goal of this dissertation was to identify transcripts that are systemically

up-regulated in response to fire damage in tomato plants. In order to accomplish this task,

several advances for sequencing-based methods of gene expression analysis had to be

developed and refined before meaningful analysis of a subtractive cDNA library could be

achieved. In Chapter 2, methods for improving sequence quality control and identifying

false sequences are presented. A method for identifying adaptor contaminants was

developed and used to identify over 78,000 false sequences in GenBank. One of the many

contaminated sequences was from the human p8/com1 gene, which has implications for

research on breast cancer. Other types of sequence contamination include sequences from

vectors and foreign organisms (pathogens, etc.), which were found in several thousand

locations in the indica rice genome. In Chapter 3, a novel method for identifying and

evaluating housekeeping genes using sequence data is presented. Using this method with

tomato sequences, relative expression analyses for 127 potential housekeeping control

transcripts were performed. These analyses provided potential housekeeping transcripts

which were used for real-time RT-PCR experiments later in the dissertation (Chapter 5).

In order to characterize the array of transcripts which systemically accumulate in

plants after fire damage, a subtractive cDNA library was used for their isolation and

identification, and these are described in Chapters 4-6. Chapter 4 (with Appendix 1) presents

the identification and characterization of 23 transcripts which encode all 13 subunits of

vacuolar H+-ATPases in tomato plants. This study stemmed from the discovery that one of the transcripts from the library encoded a c subunit of vacuolar H+-ATPase. In Chapter 5

(with Appendix 2), the library served as a starting point to identify and characterize 9 novel tomato transcripts systemically up-regulated in leaves in the first hour after a distant leaf is

2 flame wounded. Real-time RT-PCR using leaf RNA isolated at different times after flaming

showed that the most common pattern of transcript accumulation was an increase within 30

to 60 minutes, followed by a return to basal levels within 3 hours. Expression analyses also

showed that most up-regulated transcripts were already present in unwounded tissues.

Structural and functional predictions were also performed for each of the 9 novel transcripts.

In Chapter 6, a total of 46 different transcripts are described which were identified from the subtractive cDNA library. Compared with the entire tomato transcriptome, these 46 wound- up-regulated transcripts are very highly conserved. The vast majority fell into 5 classes: enzymes of general metabolism; protein synthesis, modification, and transport; transcription; membrane transport; and photosynthesis and respiration. At least half of the transcripts have been previously associated with wounding or stress, suggesting that the systemic response to fire damage has components similar to those of other wound and stress responses. On the other hand, 30% of transcripts were associated with photosynthesis and respiration, suggesting that part of the response to fire damage is notably different from other wound and stress responses. In addition to furthering knowledge on systemic responses to fire damage,

Chapters 4-6 (and Appendices 1 and 2) demonstrate how sequence data can be used simultaneously for gene discovery and expression analyses.

In Chapter 7, conclusions and future directions are provided for gene expression analyses using sequence data and for the biology of systemic responses to fire damage.

Future directions include a universal sequencing-based method of gene expression analysis,

as well as experiments to address whether or not the 46 transcripts lead to proteins which

actually function during the systemic response to fire damage.

3 Appendix 3 presents several educational studies on how to involve undergraduates

and high school students in research projects such as the ones presented in this dissertation.

The overall flow of work for this dissertation is shown in Figure 1. Work began with a subtractive cDNA library containing tomato transcripts up-regulated during a systemic response to flame wounding. From the subtractive cDNA library, tomato cDNA fragments were isolated and sequenced. The sequences were then screened for various types of contamination (using methods developed in Chapter 2). Blast searches of GenBank databases allowed the sequences to be divided into 3 classes based on their similarity to known genes: known tomato genes, homologous to known genes (but not known in tomato), and unidentifiable. The cDNA fragments which were unidentifiable were then analyzed in much more detail. Using expressed sequence tags (ESTs) in public databases, the full-length open reading frames of the transcripts were pieced together with the aid of bioinformatics tools. These full-length sequences were then checked experimentally by building PCR primers, amplifying them from a cDNA sample, and sequencing. The ESTs from public databases were also used to perform expression analyses. Using the full-length open reading frame sequences, extensive bioinformatics work was performed to predict the structures and functions of the putative proteins. Finally, real-time RT-PCR was performed over a 6 hour time course after flame wounding to better understand the kinetics of transcript accumulation. Housekeeping controls which were used in real-time RT-PCR experiments were chosen using the methods presented in Chapter 3.

4

Subtractive cDNA library of tomato genes up-regulated during a systemic wound response

Clone isolation and sequencing

Sequence quality control VecScreen Bacterial database searches

Blast searches of GenBank

ESTs ESTs from known Unidentifiable homologous to tomato genes ESTs known genes

Relative expression Sequence analysis using the extension using TIGR TGI the TIGR TGI

Sequence verification (PCR & sequencing)

Real-time RT- Housekeeping Blast searches of PCR (6 hr. controls GenBank timecourse)

PROSITE Protein family Pfam analysis PRINTS ProDom SMART TIGRFAMS Structural analysis

Localization Transmembrane Alpha helices / Interacting Coiled-coils / signals regions Beta sheets proteins leucine zippers

TargetP PHDhtm PROFsec DIP COILS HMMTOP 2ZIP

Figure 1. Strategy to identify and analyze cDNAs up-regulated in tomato leaf tissue during a systemic wound response to fire damage. Chapter 2 addresses issues of sequence quality control (light gray), Chapter 3 deals with selection of housekeeping controls (dark gray), and Chapters 4-6 present analyses beginning with the subtractive cDNA library and extending the length of the flow diagram.

5

Chapter 2

Sequence Quality Control

Jeffrey S. Coker and Eric Davies

Eric Davies provided guidance and editorial assistance.

This chapter is divided into three separate papers. Data associated with the first paper were reported to the National Center for Biotechnology Information in 2001, leading to the correction of numerous RefSeqs (curated gene sequences). The first paper has been accepted for publication in Biotechniques, and the second will be submitted. The third paper was published in 2002 in the journal Cancer Research 62, 4164-4165, and led to the correction of the human p8 cDNA sequence in GenBank.

6

Identifying adaptor contamination when mining DNA sequence data

Jeffrey S. Coker and Eric Davies

Department of Botany, North Carolina State University, Campus Box 7612, Raleigh, North Carolina 27695. email: [email protected]

Abstract

Meaningful analysis of DNA sequences depends on the accuracy of the sequences themselves, and so false sequences in public databases are a major concern for bioinformatics research. We describe a simple screen which has identified adaptor contamination in over

78,000 eukaryotic sequences in GenBank. Most of these entries were found in the GenBank

EST databases, but 4,528 were found in the GenBank/EMBL/DDBJ/PDB “nr” database. Out of a subset of 210 contaminated “nr” database entries, adaptor sequence was present in 82

(39%) as part of a gene or cDNA and in 11 (5%) as part of an open reading frame. Adaptor contamination was found to extend beyond public databases since 108 of the 210 “nr” entries are linked to peer-reviewed publications. Bioinformatics work which uses data mined from public sequence databases should include a simple check for adaptor contamination.

Detection of adaptor sequence contamination is made far easier by knowing that over 99% of adaptor contaminants appear near the ends of sequences, are flanked by vector, or involve adaptor dimerization.

7

Analysis of DNA sequences can only be as correct as the sequences themselves, and so contamination in public databases is a major concern for bioinformatics research. Here we describe a simple screen which identified adaptor contamination in over 78,000 eukaryotic sequences in GenBank. Awareness that over 99% of adaptor contaminants appear near the ends of sequences, are flanked by vector, or involve adaptor dimerization allows the detection of 99% of these sequences (Fig. 1).

A contaminated sequence is defined as “one that does not faithfully represent the genetic information from the biological source organism/organelle because it contains one or more sequence segments of foreign origin” (http://www.ncbi.nlm.nih.gov/VecScreen/contam.html). Sources of contamination for nuclear DNA and cDNA include vector sequence (1-6), plasmid vector insertion sequences (7), impure tissue sources (8), faulty laboratory protocols (9-10), mitochondrial DNA (11), and ribosomal DNA/RNA (12). There is one published account of contamination due to adaptor sequences, where it was shown that commercial adaptor sequences matched the 5’ or 3’ end of 728 GenBank and EMBL sequences (13). Strategies to decrease contamination in database sequences have emphasized vector sequences (4-6, 8) and given little attention to adaptor contamination.

An adaptor is a short oligonucleotide that is ligated to the ends of cDNAs for incorporation into a vector cloning site (Fig. 1). Usually adaptors consist of several restriction sites, one blunt end (for ligation to cDNA) , and one cohesive end (for ligation to a vector). Adaptors are frequently used in the construction of cDNA libraries and in generating cDNA ends using RACE (rapid amplification of cDNA ends) PCR.

8 The presence of adaptor sequences in organismal sequences in public databases has

the potential to cause many different errors of interpretation (14,15) which include the following:

False hits for others using public databases. Added difficulties in identifying genes and joining contigs. Misconstruction of PCR primers, microarrays, probes, etc. Incorrect conclusions regarding evolution and differences between organisms. Incorrect conclusions about gene structure, mRNA splicing, and mRNA transport. Incorrect conclusions about protein sequence, structure, transport, and function.

To investigate adaptor contamination in public databases, BLASTn searches of

GenBank (release 140.0; Feb. 15, 2004) eukaryotic sequences were performed using the

search parameters shown in Table 1. The search parameters returned perfect matches (100%

identity) with the respective adaptor sequences (Table 1). It should be noted that 3 separate

searches of the EST databases were performed for Stratagene Zap and Clontech P1/PN1

adaptors (human, mouse, and non-human/mouse ESTs were searched separately using the E-

values in Table 1) because searching all ESTs simultaneously returned more hits than the

server could process. Manual review of individual GenBank entries, literature review, and

personal communications were used to investigate several hundred matches further.

GenBank entries with adaptor contamination were also screened for vector contamination

using VecScreen (www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html), the tool commonly

used to screen GenBank submissions.

The searches and subsequent analyses identified over 78,000 contaminated sequences

in GenBank (Table 1). Most contaminated sequences were found in the GenBank expressed

sequence tag (EST) database, but the “nr” database (which contains annotated genes, etc.)

also contained 4,528 false sequences. There were also a large number of shorter matches

9 with adaptors that were not included when using the search parameters in Table 1, making it evident that the actual number of contaminated sequences is much higher than shown in

Table 1. Simply increasing the E-value will return these shorter matches.

Within the contaminated GenBank sequences, over 99% of adaptors were within 50 bp of an end, connected to vector sequence match as shown by VecScreen, or involved in dimerization (Fig. 1). The majority of matches not near the 5’ or 3’ end involved dimerization of Stratagene’s ZAP adaptor as shown in Figure 1. We performed BLASTn searches using the full sequences of many GenBank entries that included putative dimer sequences in the gene or cDNA sequence. These searches typically resulted in some

GenBank entries matching the query on one side of the dimer, but had totally different entries matching the other side, suggesting that the query sequences actually contained two unrelated sequences that were joined via dimerization. Obviously, this has the potential to create significant errors, especially since the dimer is often in the middle of sequences where it is more likely to be interpreted as part of the open reading frame.

A subset of 210 matches (from the “nr” database) with Clontech’s Marathon primer adaptors were examined more closely. These adaptors are part of Clontech’s suppression subtractive hybridization procedure (U.S. patents 5,565,340 and 5,759,822) used originally to make cDNA libraries and probes (16,17). Currently, a single 44 bp adaptor (P1/PN1) is used in both Marathon and PCR-Select products. The first guanine residue in P1 has been changed to a cytosine in recent Clontech kits.

STAATACGACTCACTATAGGGC TCGAGCGGCCGCCCGGGCAGGT P1 PN1

10 In the first Clontech libraries utilizing this technology, a second adaptor (P2/PN2) was also used (16).

TGTAGCGTGAAGACGACAGAA AGGGCGTGGTGCGGAGGGCGGT P2 PN2

Of 210 matches with Clontech Marathon adaptors, at least 82 (39%) are contaminated in

regions designated as gene or cDNA sequence, including 11 open reading frames (5%).

Through literature review and personal communications, we confirmed that Clontech

protocols had been used. Published literature shows these false sequences appearing in

transposons, protein sequences, regions used to join contigs, and other biologically relevant

regions. In fact, we found published accounts of (unrecognized) contaminated sequence in

most major journals of genetics and molecular biology.

The recognition of adaptor contamination has the potential to resolve many problems

in the literature (14,15). It is expected that removing adaptor contamination will clarify

many gene sequences as individual labs reinterpret their own sequences, and will prevent

those mining data from amplifying such errors.

Acknowledgments

We thank the scientists who corresponded with us regarding their GenBank entries,

Sophia Clotho for advice, Ron Sederoff for critical review, and staff at NCBI for their

correspondence.

11 References

1. Lamperti, E.D., J.M. Kittelberger, T.F. Smith, and L. Villakomaroff. 1992. Corruption of genomic databases with anomalous sequence. Nucl. Acids Res. 20:2741-2747.

2. Lopez, R., T. Kristensen, and H. Prydz. 1992. Database contamination. Nature 355:211.

3. Reynolds, T.L. 1994. Vector DNA artifacts in the nucleotide-sequence database. Biotechniques 16:1124-1125.

4. Harger, C., M. Skupski, J. Bingham, A. Farmer, S. Hoisie, P. Hraber, D. Kiphart, L. Krakowski, et al. 1998. The Genome Sequence DataBase (GSDB): improving data quality and data access. Nucl. Acids Res. 26:21-26.

5. Miller, C., J. Gurd, and A. Brass. 1999. A RAPID algorithm for sequence database comparison: application to the identification of vector contamination in the EMBL databases. Bioinformatics 15:111-121.

6. Seluja, G.A., A. Farmer, M. McLeod, C. Harger, and P.A. Schad. 1999. Establishing a method of vector contamination identification in database sequences. Bioinformatics 15:106- 110.

7. Binns, M. 1993. Contamination of DNA database sequence entries with Escherichia coli insertion sequences. Nucl. Acids Res. 21:779-779.

8. White, O., T. Dunning, G. Sutton, M. Adams, J.C. Venter, and C. Fields. 1993. A quality- control algorithm for DNA-sequencing projects. Nucl. Acids Res. 21:3829-3838.

9. Gersuk, V.H. and T.M. Rose. 1993. Database contamination. Science 260:606.

10. Dean, M. and R. Allikmets. 1995. Contamination of cDNA libraries and expressed- sequence-tags databases. Am. J. Hum. Genet. 57:1254-1255.

11. Wenger, R.H. and M. Gassmann. 1995. Mitochondria contaminate databases. Trends Genet. 11:167-168.

12. Gonzalez, I.L. and J.E. Sylvester. 1997. Incognito rRNA and rDNA in databases and libraries. Genome Res. 7:65-70.

13. Yoshikawa, T., A.R. Sanders, and S.D. Detera Wadleigh. 1997. Contamination of sequence databases with adaptor sequences. Am. J. Hum. Genet. 60:463-466.

14. Coker, J.S. and E. Davies. 2002. Correspondence re: A.H. Ree et al., Expression of a Novel Factor in Human Breast Cancer Cells with Metastatic Potential (Cancer Res., 59: 4675-4680, 1999). Cancer Res. 62:4164-4165.

12 15. Forster, P. 2003. To err is human. Annals of Human Genetics 67: 2-4.

16. Diatchenko, L., Y-F. Chris Lau, A.P. Campbell, A. Chenchik, F. Moqadam, B. Huang, S. Lukyanov, K. Lukyanov, et al. 1996. Suppression subtractive hybridization: A method for generating differentially regulated or tissue-specific cDNA probes and libraries. Proc. Natl. Acad. Sci. USA 93:6025-6030.

17. Jin, H., X. Cheng, L. Diatchenko, P.D. Siebert, and C.C. Huang. 1997. Differential screening of a subtracted cDNA library: a method to search for genes preferentially expressed in multiple tissues. Biotechniques 23:1084-6.

13

Table 1. Sequences and search parameters to identify entries in GenBank contaminated by 7 commercial adaptor sequences.

Adaptor Sequence to search Detected by Search Parameters Matches in Eukaryota VecScreen? Filter E-value Word size Identity nr database EST database Clontech P1/PN1 TCGAGCGGCCGCCCGGGCAGGT Yes none 1 7 100 255 11655 Clontech P2/PN2 AGGGCGTGGTGCGGAGGGCGGT No none 1 7 100 13 705 Clontech EcoRI AATTCGCGGCCGCGTCGAC Yes none 0.05 7 100 156 15071 Promega EcoRI AATTCCGTTGCTGTCG No none 5 7 100 120 1167 Stratagene/Amersham Pharmacia EcoRI/NotI AATTCGCGGCCGC No none 150 7 100 765 16196 Stratagene ZAP AATTCGGCACGAG No none 150 7 100 3166 28830 Stratagene ZAP (dimer) CTCGTGCCGAATTCGGCACGAG No none 0.005 7 100 (778) (24106) Life Technologies 3' RACE GGCCACGCGTCGACTAGTAC Yes none 10 7 100 53 66 4528 73690 =78218

14

cDNA

adaptor cDNA adaptor

Sequencing start site

Bacter ial plasmid

Unedited sequence 1 Unedited sequence 2 Unedited sequence 3

3 types of adaptor contamination

1) 5’ or 3’ end

2) Flanked by vector

3) Adaptor dimers

Stratagene AATTCGGCACGAG ZAP Adaptor GCCGTGCTC

Dimer CTCGTGC CG AATTCGGCACGAG sequence GAGCACGGCTTAA GCCGTGCTC

Figure 1. The path from sequencing a cDNA to an improperly edited sequence. More than 99% of sequences contaminated with adaptors fall into one of the 3 groups shown at the bottom.

15

Cleaning data mined from the indica rice genome draft

Jeffrey S. Coker and Eric Davies

Department of Botany, North Carolina State University, Campus Box 7612, Raleigh, North Carolina 27695. email: [email protected]

Filtering out false sequences is a challenge for every genome project. Because the

Oryza sativa L. ssp. indica genome draft (1) is a major resource for efforts to improve the world food supply, its accuracy is of paramount importance and thus needs to be scrutinized very closely. The analysis presented here is intended especially for those mining data from the indica genome, and indicates false sequences of three different types: short (< 21 bp) remnants of SmaI-linearized pUC18 plasmid, regions of other cloning vector(s), and genomic sequence from an unidentified species of Phytophthora.

Recommendations are given for how to identify each type of false sequence when using data mined from the indica genome draft. Removal of false sequences is necessary to avoid errors in calculating polymorphism rates, gene discovery, estimating lateral gene transfer, and many other forms of bioinformatics research.

SmaI-linearized pUC18 plasmid

It was reported that a SmaI-linearized pUC18 plasmid was used for cloning rice genomic fragments (1), and thus it follows that each rice sequence would have been flanked by pUC18 before the sequence was “cleaned”. We have found that short remnants of pUC18 are still scattered throughout the indica genome. As shown in Table

1, 98% of matches with the pUC18 SmaI site (≥14 bp) in both the unassembled data and fully masked reads end within 5 bp of a 5’ or 3’ end. All but four sequences in the

16 unassembled data and one fully masked read are within 15 bp of an end. This suggests

that the vast majority of matches with the pUC18 SmaI site derive from cloning vector

and are not genuine rice sequences. Peripheral contaminants in unassembled data are not

a problem as long as they are removed before assembly.

A much more significant problem occurs when these contaminants become

internalized as sequences are joined together. Table 2 shows examples of internalized

pUC18 artifacts which were found in the scaffolds listed in Table 1. The ratio of

internalized contaminants to total contaminants leads us to conclude that 5-7% of

peripheral contaminants were internalized during contig/scaffold construction. Each

scaffold in Table 1 matches japonica rice entries in GenBank directly before and after the short region in question but not within it, proving that each is a false sequence. For example, Scaffold 9177 (GenBank acc. no. AAAA01009177) contains a pUC18 fragment at 6913 bp, and matches japonica sequences on both sides of the fragment (Table 2).

Although the pUC18 fragment is only 20 bp long, the “hole” in the indica sequence

(compared to japonica) is 517 bp long. There are many examples of such holes which are clearly not biological in origin. From a comparison of Chromosome 4 between indica and japonica, it has been suggested that japonica sequence may be “larger” because of insertions of transposable elements, and the average frequency of single- nucleotide polymorphisms is 1 SNP per 268 bp (3). However, since many apparent insertions and SNPs are due to the presence of false sequences and holes in the indica draft, such conclusions about differences between indica and japonica may be premature.

Since contamination by 14-20 bp fragments is present, a much larger number of

scaffolds are expected to contain 1-13 bp bits of the pUC18 SmaI site. For instance,

17 random chance would furnish only 4.5 matches with the 13 nucleotide sequence preceding the SmaI site (CTAGAGGATCCCC), but indica scaffolds have 1274 matches, while japonica has only 10 (2). Comparing the number of possible pUC18 artifacts (7-20 bp) with the number of matches one would expect by chance (E-values) leads to a prediction of over 13,000 contaminants (Fig. 1), or .029% of the total contig length. The

7-20 bp pUC18 fragments alone (not including 1-6 bp fragments and the “holes” they often represent) could account for 14% of the SNPs (1 SNP per 269 bp) between indica and japonica (3).

For those mining data from the indica rice genome, we recommend the following steps: 1) Search all sequences for fragments of the pUC18 SmaI site

(GTCGACTCTAGAGGATCCCC) 2) Remove the pUC18 sequences when they occur at the end(s) 3) For internal pUC18 matches, take 200-500bp of sequence surrounding each possible pUC18 artifact and Blast it against japonica and/or other rice sequences in

GenBank. If the region is not genuine rice sequence, the sequences may match on either side of the SmaI site, but will not match indica in the SmaI site. Closer examination usually reveals a “hole” in the indica sequence ranging from 10bp to several thousand base pairs. Data miners should also be aware that every pUC18 contaminant that is at least 12 bp contains a potential false “STOP” site (TAG) from base 10 to 12.

Regions of other cloning vector(s)

It appears that vectors other than pUC18 were also used for indica library construction. In some cases, matches with a particular vector appear on both ends of a scaffold and correspond with a restriction site in that vector. For example, over 100 bp of

18 Life Technologies pZL1 from Lambda ZipLox (or a similar vector) is at the ends of at least 25 scaffolds (e.g. Scaffold 89563) (4). In other cases such as Scaffolds 39078 (1276 bp), 45670 (1105 bp), and 82154 (691 bp), entire indica scaffolds are 99-100% identical to several dozen common vectors but match no rice sequences in GenBank or Syd (2). In other more ambiguous cases (e.g. Scaffold 101296), scaffolds are near perfect matches with both vectors and rice ESTs in GenBank, but still match nothing in Syd. Judging by the large size of these matches, it is unlikely that all vectors used in library construction were accounted for in decontamination screens.

For those mining data from the indica genome, we recommend that sequences of particular interest are compared to the VecScreen database (4) and/or bacterial databases.

Phytophthora

Phytophthora are well-known stramenopiles that commonly parasitize a wide variety of plant species. There are several dozen indica scaffolds that match

Phytophthora sequences but do not closely match sequences either in japonica or any other higher plant (Table 3). For example, Scaffold 45690 (Contig 77125) has 99.7% identity with 1107 bp of P. infestans mitochondrial DNA coding for three ribosomal proteins, but has no significant match with any plant sequence. Searches of indica identified 226 scaffolds that match GenBank Phytophthora sequences with an E-value of

1x10-10 or lower (5). Many of these may be highly conserved rice sequences and not from Phytophthora. Even so, since it is evident that there are sequences from

Phytophthora present (Table 3) and no Phytophthora genome has been completely sequenced, these and perhaps many other scaffolds must be reassessed.

19 There are three possible explanations for Phytophthora-like sequences in the indica genome: pathogen-infected tissue, cross-contamination of libraries, and lateral gene transfer. It is quite possible that pathogen-infected rice tissue was used for DNA isolation since pathogens are notoriously prevalent in plant tissue. The more exciting explanation would be lateral gene transfer after the divergence of indica from japonica.

However, we are unaware of any example of simultaneous lateral gene transfer of nuclear genes encoding mRNA (e.g. ric1 and actA) and rRNA (e.g. 18S), and mitochondrial genes encoding mRNA (e.g. rp12, rps19, and rps3) and rRNA (e.g. 16S rRNA), all of which seem to be present in indica (Table 3).

For those mining data from the indica genome, we recommend that sequences of particular interest are compared to Phytophthora and japonica sequences (including

ESTs). Contaminants will be nearly identical to Phytophthora sequences (if they have been sequenced in Phytophthora). On the other hand, if the indica sequence is nearly identical to a japonica sequence, then it is not likely to be a contaminant.

Conclusions

The indica rice genome draft has already been used to evaluate monocot and eudicot divergence (6), sequence variation between varieties of rice (3, 7), single nucleotide polymorphisms in rice varieties (3, 8), characteristics of various gene families

(9, 10), and many other important topics. It serves as an important resource for improving world food supply and will be used extensively in the future, and so it is critical that those mining the indica genome be aware of its imperfections.

20 References

1. J. Yu et al., Science 296, 79 (2002); http://210.83.138.53/rice/.

2. S.A. Goff et al., Science 296, 92 (2002); http://portal.tmri.org/rice/.

3. Q. Feng et al., Nature 420, 316 (2002).

4. Kitts, P.A., Madden, T.L., Sicotte, H. & Ostell, J.A. Manuscript in preparation; http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html.

5. All GenBank Phytophthora sequences (including ESTs) were searched against the indica genome using MegaBlast. Scaffolds with significant matches were then used to search all GenBank sequences (BLASTn).

6. M. Vincentz et al., Plant Physiol. 134, 951 (2004).

7. C. Li et al., Theor. Appl. Genet. 108, 392 (2004).

8. S. Nasu et al., DNA Res. 9, 163 (2002).

9. S. Griffiths et al., Plant Physiol. 131, 1855 (2003).

10. L. Jia et al., Plant Physiol. 134, 575 (2004).

11. We thank R. Dean and T. Houfek for confirming our Phytophthora findings and for discussing all of our results, J. Xiang for her thoughts on the conservation of rRNA, E. Coker for computer expertise, and Sophia Clotho for her thoughts.

21

Table 1. Matches in the indica genome with the pUC18 SmaI site (GTCGACTCTAGAGGATCCCC). Matches shown are at least 14 bp long (Expect ≤ 5.7). pUC18 sequences are typically on an end (within 5 bp) of raw genomic sequences such as those in the unassembled data and fully masked reads, but became internalized as contigs and scaffolds were pieced together.

Sequence type Matches Matches at a 5’ or 3’ end Unassembled data 1990 98.1% (1953) Fully masked reads 4342 98.0% (4255) Contigs 944 85.9% (811) Scaffolds 990 70.3% (696)

22

35000

30000 Matches in indica genome Expect value 25000 s e c n e u 20000 seq

f o

er 15000 mb u N 10000

5000

0 20 19 18 17 16 15 14 13 12 11 10 9 8 7 Length of match with pUC18 SmaI site (bp)

Fig. 1. Matches of 20 bp (GTCGACTCTAGAGGATCCCC), 19 bp (TCGACTCTAGAGGATCCCC), 18 bp (CGACTCTAGAGGATCCCC), etc. in the indica genome corresponding to the pUC18 SmaI site. The expect values approximate the number of hits one would expect by chance, assuming a random genome sequence. This leads to a prediction of over 10,000 contaminants of 7 bp or longer.

23

Table 2. Examples of internal pUC18 artifacts (≥14 bp) in indica scaffolds. In each case shown, the corresponding japonica sequence matches the indica scaffold directly before and after the artifact. The “holes” in the indica sequences range from 14 to several thousand bp long. All artifacts shown are more than 100 bp from a scaffold end and from any unfilled gaps within scaffolds (designated by a stretch of N's in GenBank). Scaffolds are listed as their GenBank accession numbers (AAAA01 + scaffold number) to facilitate further review.

Artifact Corresponding Scaffold Length (bp) location japonica match AAAA01000517 40212 6774 AP005289.2 AAAA01000875 33584 15658 AC124836.2 AAAA01000879 34316 29865 AC090484.4 AAAA01001305 30163 10745 AC137634.3 AAAA01001453 29009 8647 AP003282.2 AAAA01002627 22827 863 AC146893.1 AAAA01004136 18264 9608 AC137073.2 AAAA01005244 16035 1108 AP004762.3 AAAA01006321 14292 8637 AC137999.2 AAAA01008123 11429 2056 AE017073.1 AAAA01009177 10884 6913 AP003204.3 AAAA01009685 10424 3987 AL663008.3 AAAA01011822 8659 452 AP003988.2 AAAA01011882 8621 330 AP004262.2 AAAA01011939 8590 1286 AP005002.2 AAAA01014582 6939 1294 AC136520.2 AAAA01015702 6320 5417 AL663018.4 AAAA01018944 4789 176 AC135928.2 AAAA01019811 4431 3969 AE017063.1 AAAA01019999 4366 4148 AP003301.3 AAAA01020286 4259 1857 AL606992.3 AAAA01022160 3609 819 AC137607.2 AAAA01029543 2088 540 AP003518.2 AAAA01054885 966 812 AE017102.1

24 Table 3. Examples of phytophthora-like sequences in the indica genome. "Closest" matches are defined as those with the lowest E-value (E<10) in GenBank databases. In all cases shown here, the Phytophthora match spanned the majority of the scaffold and had an effective E-value of 0. Short regions (18-80 bp) on the ends of 8 of these scaffolds are also contaminated by plasmid sequences.

Indica scaffold Closest match in all organisms Closest match in japonica genome Identity Acc. No. Description Identity Acc. No. Description AAAA01045690 1104/1107 (99%) U17009.2 P. infestans rib. prot. L2, S19, and S3 ------AAAA01065444 838/844 (99%) AJ238654.1 P. undulata 18S rRNA gene 536/617 (86%) AP004778.3 Genomic DNA, chromosome 2 AAAA01078719 705/709 (99%) X54265.1 P. megasperma 16S rRNA 613/715 (85%) AP004778.3 Genomic DNA, chromosome 2 AAAA01076286 639/647 (98%) BE776357.1 P. infestans unidentified cDNA ------AAAA01070180 630/633 (99%) BE776214.1 P. infestans unidentified cDNA ------AAAA01084216 630/636 (99%) BE777367.1 P. infestans unidentified cDNA ------AAAA01070144 581/584 (99%) BE775905.1 P. infestans unidentified cDNA 381/437 (87%) AK063121.1 cDNA clone:001-111-E07 AAAA01090700 579/587 (98%) AJ133023.1 P. infestans ric1 gene ------AAAA01091080 556/557 (99%) U50844.1 P. infestans host-specific elicitor inf1 gene ------AAAA01082659 556/559 (99%) BE776610.1 P. infestans unidentified cDNA ------AAAA01086249 557/567 (98%) BE776104.1 P. infestans unidentified cDNA ------AAAA01055069 555/584 (95%) BE776247 P. infestans unidentified cDNA 832/904 (92%) AK060330.1 cDNA clone:001-008-B01 AAAA01049644 489/498 (98%) BE777164 P. infestans unidentified cDNA ------AAAA01063300 444/445 (99%) M59715.1 P.infestans actin (actA) gene 387/457 (84%) AK059967.1 cDNA clone:006-211-F12 AAAA01102792 237/237 (100%) AF339424.1 P. infestans 5.8S rRNA (and spacer) ------

25

26 27

Chapter 3

Selection of Candidate Housekeeping Controls in Tomato Plants using EST Data

Jeffrey S. Coker and Eric Davies

Eric Davies provided guidance and editorial assistance.

This chapter was published in 2003 in the journal Biotechniques 35, 740-748. It is currently being considered for a patent under the title “Method for Identifying Constantly Expressed Genes Using Nucleic Acid Sequence Data” (NCSU Disclosure File Number 04-064).

28

29

30

31

32 33

Chapter 4

Identification, Conservation, and Relative Expression of V-ATPase cDNAs in Tomato Plants

Jeffrey S. Coker, Derek Jones, and Eric Davies

Derek Jones assisted in mining data for c subunit cDNAs. Eric Davies provided guidance and editorial assistance.

This chapter was published in 2003 in the journal Plant Molecular Biology Reporter 21, 145-158.

34

35

36

37

38

39

40

41

42

43

44

45

46

47 48

Chapter 5

Identification, Accumulation, and Functional Prediction of Novel Tomato Transcripts Systemically Up-regulated after Fire Damage

Jeffrey S. Coker, Alan Vian, and Eric Davies

Alan Vian constructed the subtractive cDNA library. Eric Davies provided guidance and editorial assistance.

This chapter has been submitted for publication.

49 Abstract

Despite the major impacts of fire on plants, responses to fire damage have not been closely studied on the level of gene expression. Here we present analyses of novel transcripts from tomato (Lycopersicon esculentum) which are systemically up-regulated in leaves after a distant leaf is wounded by flame. Nine cDNA fragments were isolated from a subtractive cDNA library of leaf tissue 1 hour after flaming. Using data mining and PCR, full-length open reading frames were predicted, amplified, and then sequenced.

Comparisons with the Arabidopsis genome suggested that 8 of the encoded proteins are slow-evolving. Real-time RT-PCR using leaf RNA after flaming confirmed the systemic accumulation of 4 and 7 transcripts within 30 and 60 minutes, respectively, before returning to basal levels within 3 hours. During this same time course, proteinase inhibitor I levels gradually increased over 30-fold in 6 hours. Expression analyses also showed that 8 of the transcripts are present in unwounded leaf, stem, and root tissues.

The predicted proteins include an acyl carrier, adenylyl sulfate reductase, PS II oxygen- evolving complex protein 3, anion:sodium symporter, chloroplast-specific ribosomal protein, a histidine triad family protein, and an unknown wound/stress-related protein.

Homologues of several of these proteins have been associated with other types of wound and stress responses. It appears that within an hour after being damaged by fire, plants systemically up-regulate a variety of genes involved with basic cell metabolism and upkeep, in addition to classic defense genes such as proteinase inhibitor

50 Introduction

Plants must cope with a wide variety of natural wounding stimuli such as fire, herbivory, wind, rain, hail, UV radiation, sand, and trampling. Because plants are sessile and cannot escape these stimuli, to ensure survival they often respond to tissue damage by changes in gene expression (Graham et al., 1986; Braam and Davis, 1990; Schaller and Ryan, 1996; León et al., 2001) in both damaged tissues (local responses) and in undamaged tissues (systemic responses). Many “systemic wound response proteins”

(Schaller and Ryan, 1996), which are expressed in undamaged tissues following the intercellular transmission of a wound signal, have been previously identified in tomato plants. These include proteinase inhibitors (Green and Ryan, 1972), systemin (Pearce et al., 1991), an aspartic protease (Schaller and Ryan, 1996), chloroplast mRNA-binding protein (Vian et al. 1999), a bZIP DNA-binding protein (Stanković et al., 2000), allene oxide synthase and fatty acid hydroperoxide lyase (Howe et al., 2000), and others.

Further characterization of the array of systemically up-regulated genes is necessary to better understand plant defense and stress response mechanisms.

Knowledge of systemically up-regulated genes is also necessary to characterize the intercellular signals that move from wounded to unwounded tissue. Systemic signals that have been proposed include proteinase inhibitor-inducing factor (Ryan, 1974), systemin (Pearce et al., 1991), abscisic acid (Peña-Cortés et al., 1991), oligosaccharides

(Ryan and Farmer, 1991), methyl jasmonate (Herde et al., 1996), action potential

(Stanković and Davies, 1996), and variation potential (Wildon et al., 1992; Vian et al.,

1996). It is clear that the systemic wound response is a complex network(s) induced by many different signals, and that the extent and timing of these signals may vary

51

significantly depending on the plant species and the precise nature of the wound. For

example, evidence from Arabidopsis microarray experiments suggests that there are

fundamental differences in gene expression in response to mechanical wounding and

insect feeding (Reymond et al., 2000). On the other hand, there is clear evidence for

cross-talk between defense responses such as those that are herbivore- and pathogen-

directed (Stennis et al., 1998). Much about how responses to fire damage compare with

other types of wound responses is unknown.

Fire impacts most terrestrial ecosystems, and plants have evolved mechanisms to

survive fire (Bond and van Wilgen, 1996; DeBano et al., 1998). For example, in the

southeastern United States, shrubs and herbaceous plants in savannas, forests, evergreen

shrub bogs, wire grass sand-hills, swamps, and other ecosystems often survive fires and

are able to resprout and reproduce in future years (Bond and van Wilgen, 1996; DeBano

et al., 1998; Wells, 2002). In fact, some of the most species-rich plant ecosystems (i.e.

the herbaceous groundcover of longleaf pine savannahs) require fire to persist (Platt et

al., 1988; Drewa et al., 2002). A common misconception is that all wildfires kill all

plants in the burned area. The National Parks Service has used a 5-tiered “burn severity class” system to describe vegetation damage following a wildfire which includes undamaged (tier 1), scorched (tier 2; leaf litter is singed and foliage is slightly yellowed),

and low severity (tier 3; leaf litter is partly/mostly consumed but foliage remains intact)

classes (USDI, 1992). Resprouting after fire damage can occur from partially burned

above-ground organs or from roots after complete destruction of above-ground organs.

Despite the major impacts of fire on plants, responses to fire damage have not

been closely studied on the level of gene expression. From an experimental standpoint,

52

flame causes severe, yet reproducible, damage without moving the plant. Leaf flaming

has already proven useful for identifying novel components of the systemic wound

response to fire such as Pin 1 (Wildon et al., 1992; Stanković and Davies, 1996),

chloroplast mRNA-binding protein (Vian et al., 1999) and a bZIP DNA-binding protein

(Stanković et al., 2000).

To study the impacts of fire damage (flame wounding), tomato plants have

several advantages. First, since extensive work with other wound stimuli has been done

using tomato plants, it is possible to compare flame-induced gene expression with this

previous work. Second, a substantial amount is known about wound signaling events in

tomato plants which will facilitate understanding of the timing of the response. Finally, like many species in the Solanaceae, tomato plants (both wild and cultivated) possess many characteristics which typically allow many herbaceous plants to survive fires.

These characteristics include being a perennial (Taylor, 1986), having carbohydrate reserves stored in underground organs (Peres et al., 2001; Verdaguer and Ojeda, 2002),

and the ability to regenerate shoots from hypocotyls, roots, or other tissues (Takashina et al., 1998; Bertram and Lercari, 2000; Peres et al., 2001). It has been found that smoke extract stimulates the growth of tomato roots in vitro (Taylor and van Staden, 1998), and that growth of species within the Solanaceae can be regulated by fire regimes (Preston and Baldwin, 1999). Also, a bZIP gene similar to the one we found to be up-regulated by flame-wounding (Stankovic et al., 2000) has also been associated with adventitious shoot regeneration (Low et al., 2001). Thus, tomato plants are the preferred model system for work on the systemic wound responses to fire damage.

For genes previously examined, the most common pattern of transcript

53 accumulation in leaf 4 of three-week old tomato plants following a flame wound on leaf 3 is an increase that peaks within an hour, followed by a rapid decrease (Davies et al.,

1997; Vian et al., 1999). These rapid changes are then followed by a more gradual period of increased, decreased, or unchanged transcript accumulation. This has been shown most vividly for Pin 1 (Stanković and Davies, 1997), CMBP (Vian et al., 1999), and a bZIP DNA-binding protein (Stanković et al., 2000). The complexity of responses to wounding for individual transcripts (rapid increases and decreases) and the variation between transcripts (different time points for increase/decrease) suggests that different genes are being up-regulated by different systemic signals, or combinations of signals.

This cannot be deciphered without characterizing a wider array of transcripts that accumulate systemically following flame wounding.

Here we present analyses of 9 previously unidentified tomato cDNAs which are systemically up-regulated after a distant leaf is wounded by flame. These cDNAs were isolated from a subtractive cDNA library (wound minus control) from tissue harvested one hour after flaming.

54

Results

Our strategy for identifying and characterizing clones from a subtractive cDNA

library of wound-induced transcripts is shown in Figure 1. Clones from the library were

labeled as “candidates for the systemic wound response” (CSWR). The 9 clones initially

isolated from the cDNA library ranged from 59 to 647 bp and had an average length of

292 bp (Table 1). Attempts to identify them using Blast searches of GenBank were

inconclusive and/or ambiguous. Therefore, we searched expressed sequence tags (ESTs)

in the TIGR Tomato Gene Index (TGI) to identify identical matches and extend the

cDNA sequences using consensus sequence information (Table 1). The resulting putative

cDNAs ranged from 596 to 1830 and had an average length of 1048 bp (Table 1). These

putative cDNAs were confirmed by performing PCR (Fig. 2) and sequencing the PCR

products using the primers in Table 2.

Blast searches using the extended sequences returned matches with protein

sequences in GenBank ranging from 43% to 83% identical (Table 1). The putative

translations of all 9 cDNAs suggested full-length proteins which were approximately the

same size as their respective GenBank matches. Therefore, all 9 cDNAs encode proteins similar to those sequenced in other plants, although the exact functions of most are still

unknown.

By comparing tomato Unigenes in the TIGR TGI with the Arabidopsis genome

(using tBlastx), Van der Hoeven et al. (2002) divided tomato ESTs into “not

homologous” (E value ≥ 0.1), “fast-evolving” (1.0E-15 < E value < 0.1), “intermediate

evolving” (1.0E-50 < E value < 1.0E-15), and “slow-evolving” (E value < 1.0E-50)

classes. Only about 22% of all Unigenes fell into the “slow-evolving” class. By

55

repeating their methodology, we found that 8 of our cDNAs could be considered as

“slow-evolving” and 1 (CSWR-1) as “intermediate-evolving”. This high degree of

conservation could be related to responses to fire damage being ancestral (Bond and van

Wilgen, 1996) and/or incorporating elements of basic cell metabolism/upkeep. Most

tomato genes involved directly in cell rescue, defense, cell death and aging are not fast

evolving as a group (Van der Hoeven et al., 2002). Homologues for 5 of the 9 tomato

cDNAs (CSWR-1, 2, 4, 6, and 8) were found on Arabidopsis chromosome 4, which is

interesting since a high proportion (approximately 12%) of all genes on chromosome 4

have been associated with defense and disease responses (Mayer et al., 1999).

ESTs are an excellent tool for the preliminary analysis of gene expression (Adams

et al., 1995; Coker et al., 2003; Coker and Davies, 2003), and over 155,000 tomato ESTs

representing a variety of tissues are represented in a single collection in the TIGR TGI

(Van der Hoeven et al., 2002). To further characterize our 9 cDNAs, organ-specific

expression analysis was performed using data mining (Fig. 3) and experimental

approaches (Fig. 4).

The EST analysis in Figure 3 and PCR experiments in Figure 4 both support

several trends. First, all 9 cDNAs are present in unwounded leaf tissue (usually at low levels). Although unanticipated, this is not necessarily surprising since the subtractive

library technique we used screened for up-regulated genes and not just those present in

one tissue and absent in another. Second, although our subtractive cDNA library was

constructed from leaf tissue, none of the cDNAs are leaf-specific. CSWR-1 was

represented by ESTs only from leaf/shoot tissue (Fig. 3), but PCR showed that it is also

present in roots (Fig. 4). All other cDNAs were present in multiple tissues in both

56 analyses. Third, CSWR-1, CSWR-3, CSWR-4, and CSWR-7 are more abundant in leaves than other tissues. Fourth, CSWR-6 and CSWR-8 are most abundant in root tissues. Fifth, CSWR-2 and CSWR-5 appear to be present at relatively constant levels in different organs. Finally, CSWR-9 has very low abundance in all tissues.

The mRNA accumulation of CSWR-1 through CSWR-9 in leaf 4 following flaming of leaf 3 is shown in Figure 5. Two real-time RT-PCR experiments are shown at each timepoint. Two additional biological replicates for the 0 and 60 minute timepoints were processed in a separate set of experiments and further support transcript up- regulation after flame wounding (data not shown). All technical considerations suggested that the real-time RT-PCR reactions successfully amplified a specific cDNA with high efficiency. Melting curves for all reactions showed only one peak, suggesting only one

PCR product. The efficiency of real-time PCR reactions can be calculated from the slope of a Ct vs. quantity graph using a 10-fold standard dilution (ideally -3.32 when quantity is on a log scale), and was above 99% across more than 4 orders of magnitude on all of our plates.

Proteinase inhibitor I (Pin 1) was used as a positive control and actin as a housekeeping control. Pin 1 increased 5-fold over 60 minutes and 33-fold over 6 hours

(Fig. 5). In all experiments, actin levels at 60 minutes (the peak of accumulation for

CSWR-1 through CSWR-9) were not significantly different from control levels (data not shown).

The average transcript levels of seven of the nine candidates for the systemic wound response (CSWR-1, 2, 4, 5, 6, 7, and 9) more than doubled after flame wounding

(Fig. 5). The average transcript levels of CSWR-1, 2, 4, and 7 more than doubled after

57 only 30 minutes (Fig. 5). Although transcript up-regulation was evident in both experiments, there was some variation in the timing of the response since transcripts tended to peak at 30 and 60 minutes in experiments 1 and 2, respectively (Fig. 5). After 3 hours, transcript levels in both experiments had decreased to near the original levels. It is possible that CSWR-6 and CSWR-7 maintain slightly increased levels after 6 hours, but we can not confirm this statistically in our experiments.

On the other hand, CSWR-3 and CSWR-8 levels were not increased significantly relative to the 0 timepoint and showed somewhat erratic patterns of expression early in the timecourse (Fig. 5). Both cDNAs actually decreased to below 50% of their original levels after 6 hours. Also, all cDNAs except Pin 1 seemed to slightly decrease after 5 minutes, although this was statistically significant only for CSWR-3 and CSWR-8. This decrease could be part of a general transcriptional response to flame wounding caused by increased degradation or an interruption of transcription.

The predicted proteins encoded by CSWR-1 through CSWR-9 are shown in

Figure 6. It is notable that all 4 transcripts which were more prevalent in leaves (Fig. 3 and 4) encode proteins with chloroplast transit peptides (Fig. 6). This suggests consistency between our experimental and bioinformatics approaches. The success rates for correctly predicting localization signals, transmembrane regions, and alpha helices/beta sheets with the chosen software are 85% (Emanuelsson et al., 2000), 89-94%

(Tusnády and Simon, 2001; Rost, 1996), and 72% (Rost, 1996), respectively. The

COILS program used to predict coiled-coil regions yields a set of probabilities that reflect the coiled-coil forming potential of a sequence. We accepted coiled-coils with at least

80% probability that were at least 28 bp long.

58

Discussion

Our results illustrate that flame wounding induces the systemic up-regulation of numerous transcripts within an hour. The majority of studies involving the up-regulation of genes during systemic wound responses have examined time courses from 1 to 24 hours. Nevertheless, a number of studies suggest that the systemic response begins in distant leaves within the first hour after wounding. For example, Orozco-Cardenas and

Ryan (1999) found that hydrogen peroxide generated in response to leaf crushing can be found in distant tomato leaf veins within an hour after wounding. The systemic mRNA increase of ethylene-responsive transcription factors (ERFs) peaks within the first 30 minutes after crushing parts of a tobacco leaf before returning to the original levels after an hour (Nishiuchi et al., 2002). Cutting a petiole results in systemic accumulation of

ERF3 and ERF4 in the first 10 minutes (Nishiuchi et al., 2002). Similarly, during systemic responses in tomato leaves, levels of phosphatidic acid increase fourfold within

5 minutes, while lysophosphatidylcholine and lysophosphatidylethanolamine increase twofold within 15 minutes (Lee et al., 1997). Microarray experiments suggest that mechanical wounding induces up-regulation of at least 20 genes after 15 minutes, some of which fall rapidly to their original level (Reymond et al., 2000). In summary, there is significant evidence that various components of systemic responses reach leaves distant from a wound within minutes, and our results confirm this observation for fire-inflicted wounding.

When fire burns an organic material, an oxidation-reduction reaction takes place where O-H bonds are broken and heat is released. When first heated, fuels produce water vapor and mostly noncombustible gases which include terpenes and aromatic aldehydes

59

(DeBano et al., 1998). Heat then causes pyrolysis, the chemical decomposition of fuel materials to yield organic vapors and charcoal, and eventually combustion. Inevitably, flame causes significant stress to a plant (oxidative, hydraulic, toxic, etc.) in addition to causing a local wound. Responses to wounding and other stresses could explain the up- regulation of many of the flame-induced transcripts which we describe in the following.

CSWR-1 Acyl carrier protein

The CSWR-1 protein is 54% homologous to the acyl carrier protein ACP4 in

Arabidopsis, which plays a major role in the biosynthesis of fatty acids (Branen et al.,

2003). Like ACP4, CSWR-1 is small (14 kD), expressed mostly in leaves (Fig. 3 and 4), and appears to be localized to the chloroplast (Fig. 6). ACP4 carries growing acyl chains through the various steps of fatty acid biosynthesis, which occurs mostly in plastids.

Fatty acids function as crucial components of membrane lipids and as precursors to some signaling and defense compounds such as jasmonate. ACP4 mutants have a bleached appearance, reduced photosynthetic efficiency, and a reduced lipid composition (Branen et al., 2003).

CSWR-2 Adenylyl-sulfate reductase

The CSWR-2 protein (51 kD) is 75% identical to APR1 in Arabidopsis, which has oxidoreductase activity (acting on sulfur groups) and is involved in sulfate assimilation by which inorganic sulfate is processed and incorporated into sulfated compounds (Bick et al., 1998). This leads to the synthesis of cysteine and the antioxidant glutathione (Bick

60 et al., 2001). It has been found that APR1 is regulated by oxidative stress (ozone, oxidated glutathione, etc.), and provides a mechanism to control glutathione production necessary to combat oxidative stress (Bick et al., 2001). Like APR1 (Bick et al., 1998),

CSWR-2 contains a chloroplast localization signal, a reductase domain, and a thioredoxin-like domain near the carboxyl terminus (Fig. 6).

CSWR-3 Unknown protein

CSWR-3 encodes a highly conserved 25 kD protein with no known function in any plant. Although no functional domains were detected, the protein is proline-rich and there appears to be one transmembrane region (Fig. 6). There is also a putative chloroplast localization signal, corresponding with the observation that it is more prevalent in leaves (Fig. 3 and 4).

CSWR-4 Photosystem II oxygen-evolving complex protein 3 (PsbQ)

CSWR-4 is 67% identical to the Arabidopsis photosystem II oxygen-evolving complex protein 3, and is characterized by a chloroplast localization signal and a C- terminal domain with 4 major alpha helices (Balsera et al., 2003). The transcriptional up- regulation of a homologue to this gene has been associated with salt stress (Sugihara,

2000), but not wounding (to our knowledge). Consistent with its function in chloroplasts,

CSWR-4 is expressed primarily in green tissue (Fig. 3 and 4).

CSWR-5 Putative anion:sodium symporter

CSWR-5 encodes a 44 kD, leucine-rich (13% leucine) membrane protein with approximately 10 transmembrane domains and a conserved anion:sodium symporter

61 domain (Fig. 6). It also contains a putative leucine zipper motif at the C-terminus where leucine is repeated every 7 amino acids (Fig. 6), although this could be a coincidence resulting from high leucine content. Although close homologues exist in other plants, they have not been investigated. Homologues in yeast are necessary for coping with toxins such as arsenate (Bobrowicz et al., 1997), and homologues in animals act as bile acid:sodium symporters in the liver (Hagenbuch et al., 1991).

CSWR-6 Unknown wound/stress protein

Although CSWR-6 represents a highly conserved, 20 kD protein found in many higher plants, the molecular functions of all homologues are currently unknown.

Nevertheless, there is an unmistakable pattern of close homologues in other plants (E value < 1.0E-50) being sequenced from stress-related cDNA libraries, including dehydration stress in Brassica napus (GenBank acc. no. AAK01359.1), dehydration stress in Arabidopsis (AAM65891.1 and AAM62648.1), hypersensitive response following infection by tobacco mosaic virus in Capsicum annuum (AAF63515.1 and

AAO49266.1), response to elicitors in Nicotiana tabacum (BAB13708.1), and cold stress in Capsicum annuum (AAR83862.1). Interestingly, it is also related (E value = .002) to several genes in rats and humans which underlie polycystic kidney disease.

The key feature of the protein is the lipoxygenase homology (LH2) domain, also called the PLAT (polycystin-1, lipoxygenase, alpha-toxin) domain or the PLAT/LH2 domain (Fig. 6). This domain is found in a variety of membrane or lipid associated proteins. The predicted localization signal of CSWR-6 would target it to one of several membranous structures involved in transport within the cell (i.e. Golgi,

62

endoplasmic reticulum).

CSWR-7 Chloroplast-specific ribosomal protein

CSWR-7 is homologous (E value = E-100) to a family of proteins containing the

Sigma 54 modulation protein and the chloroplast-specific ribosomal protein S30 (Johnson

et al., 1990). This family contains a number of transcripts known to be repressed by light

(Tan et al., 1994). CSWR-7 has a chloroplast transit peptide (Fig. 6), consistent with its

prevalence in green tissues (Fig. 3 and 4). CSWR-7 also has low homology (E value =

.16) to phosphatidylinositol 4-kinase and a calmodulin-binding protein family which

contains an IQ calmodulin-binding motif.

CSWR-8 Alpha/beta fold family protein

CSWR-8 encodes a 22 kD protein related to the alpha/beta fold superfamily of

proteins (Fig. 6), which includes a wide range of catalytic enzymes. CSWR-8 is

predicted to have catalytic activity, and most likely acts as a hydrolase. No transit peptide was detected and no close homologue has been closely studied. CSWR-8 is most abundant in root tissue (Fig. 3 and 4).

CSWR-9 Histidine triad family protein

CSWR-9 encodes a 16 kD protein of the histidine triad (HIT) family (Fig. 6), which is known to be involved in cell cycle regulation. However, molecular functions of close homologues of CSWR-9 are unknown. CSWR-9 is a low abundance transcript in all tissues examined (Fig. 3), and contains a predicted transit peptide at the amino terminus (Fig. 6).

63

We have characterized transcripts which accumulate systemically in tomato leaves within 1 hour after flaming a distant leaf. In all likelihood, they are involved with a wide variety of metabolic functions and are functional under non-stress conditions.

Nevertheless, homologues in other organisms have been associated with defense (CSWR-

1), oxidative stress (CSWR-2), salt stress (CSWR-4), removal of toxins (CSWR-5), and numerous other stresses (CSWR-6, etc.).

Since the systemic response to fire damage has not been well characterized, the results presented here lead us to point out an important general observation. Previous experiments using flame-wounding have treated the stimulus as a “generalized” wound

(used largely out of experimental convenience) with the intention of simulating wounds from herbivores or pathogens. Nevertheless, the fact remains that fire is nearly ubiquitous in terrestrial ecosystems and plants have evolved mechanisms to deal with fire damage. This is extremely well documented in the ecological literature, and it is thought that fire-response mechanisms are ancestral characteristics (Bond and van Wilgen, 1996).

Therefore, the fact that several of the CSWR genes shown here, Pin 1, and CMBP (Vian et al., 1999) have been associated with responses to multiple wounds/stresses suggests that fire damage invokes a systemic response with components similar to other wound and stress responses in the natural environment. Currently, there is virtually no understanding of how responses to fire damage might be unique compared to other wound/stress responses on a molecular level. This will be an important topic for future work.

64

Materials and Methods

Plant material, growth conditions, and tissue collection

Tomato plants (Lycopersicon esculentum cv. Heinz) were obtained from Stokes

Seeds (Buffalo, New York) and grown under controlled conditions in the NCSU

Phytotron on a gravel/Peat-Lite substrate (developed at Cornell University, Ithaca, NY).

Growth chambers maintained an environment of 16 h light (300 µmol s-1 m-2) at 26ºC and 8 h dark at 21ºC. A butane lighter flame was held for 2 seconds 1 cm below the third leaf of 3-4 week old plants (about 12 cm in height with the fourth leaf not fully expanded) causing immediate, localized tissue damage. For construction of the subtractive cDNA library, the fourth leaf was harvested from wounded and control plants

1 hour after wounding and immediately frozen in liquid nitrogen. For RT-PCR experiments, the fourth leaf of individual plants was harvested at 0, 5, 10, 20, 30, 60, 180, and 360 minutes after wounding and immediately frozen in liquid nitrogen. For comparison of RNA levels in different organs (in unwounded plants), tissue from roots, stems (up to cotyledonary node), and leaves were harvested from 3 plants and pooled together in liquid nitrogen before grinding. All experiments were performed in duplicate.

Subtractive cDNA library construction, screening and sequencing

A subtractive library was constructed using a PCR-Select cDNA subtraction kit

(Clontech Laboratories, Inc. Palo Alto, CA, USA) such that wound-specific cDNAs were preferentially amplified. Subtraction was performed according to the manufacturer’s recommendations, with only slight modifications as described in Vian et al. (1999). The final PCR-amplified cDNAs were ligated into the T/A vector pT-7 Blue (Novagen,

65

Madison, WI) for 2 h at room temperature using T4 DNA ligase (Gibco-BRL).

Library clones were grown on LB/ ampicillin plates and single colonies picked and

grown in LB/ampicillin suspension culture. Most clone cDNAs were prepared for

sequencing by PCR amplification (35 cycles; 94° for 45s denaturing, 56° for 60s

annealing, 72° for 60s extension) using primers specific for the pT-7 Blue cloning site

(ACCATGATTACGCCAAGCTC and TAAAACGACGGCCAGTGAAT) and purified with a QIAquick PCR Purification kit (Qiagen, Valencia, CA). Other clones were prepared for sequencing using plasmid mini-preps (Qiagen, Valencia, CA). Sequencing was performed in forward (T7 primer) and reverse (pUC/M13 reverse primer) directions using a Beckman/Coulter CEQ2000XL 8-capillary DNA sequencer (dye-terminator chemistry) at the Genomics Core Research Facility of the University of Nebraska-

Lincoln.

DNA sequence analysis and data mining

All sequences were screened for vector, primer, and adaptor contamination

(Coker and Davies, 2002) using VecScreen

(http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html). Blast searches (Altschul et al., 1997) were performed in GenBank (version 2.2.7) nucleotide databases using tBlastx and Blastn, and in protein databases using Blastx. The initial short sequences of the nine cDNAs presented here could not be identified, and were therefore used to search expressed sequence tags (ESTs) in the TIGR Tomato Gene Index (version 9.0). Identical matches allowed the putative extension of our sequences using consensus sequence information.

66

Verification of consensus sequences

To verify consensus sequences, PCR primers (see Table 2) were designed to amplify the entire predicted open reading frames using Primer3 (http://www- genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi) and OligoAnalyzer 3.0

(http://207.32.43.70/biotools/oligocalc/oligocalc.asp). PCR was performed for 35 cycles in an MJ Research MiniCycler (94° for 45s denaturing, 54° for 60s annealing, 72° for 60s extension) using Platinum PCR Supermix (Invitrogen, Carlsbad, CA) and a pooled tomato cDNA sample as template. PCR products were sequenced as described above in forward and reverse directions using the respective primers in Table 2.

Real-time RT-PCR assays

Real-time reverse transcriptase polymerase chain reaction (real-time RT-PCR) allows the detection of low-abundance mRNA with great sensitivity and quantification with great accuracy (Bustin, 2000). Total RNA was extracted using an RNeasy Plant

Mini kit (Qiagen, Valencia, CA, USA) and further purified using a DNA-free kit

(Ambion, Austin, TX, USA). To make cDNA, RT-PCR was performed on 10 ul of RNA samples (at 50 ng/ul) using an Omniscript RT kit (Qiagen, Valencia, CA, USA) with the primer

TTCTAGAATTCAGCGGCCGCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN in the presence of RNAse inhibitor (ABI, Foster City, CA, USA). The cDNA samples were then diluted to 2.5 ng/ul.

PCR primers were constructed using OligoAnalyzer 3.0 to amplify 80-120 bp products that were 100-1000 bp from the 3’ ends of cDNAs in regions that we had

67 verified through DNA sequencing (Table 2). All primers had 59-60° melting temperatures, 3’ G/C caps, 40-60% G/C content in the last 5 bp on the 3’ ends, 40-60%

G/C content overall, and matched only one tentative consensus sequence in the TIGR

TGI (Table 2).

We recently published an analytical method for identifying housekeeping controls in various tissues of tomato plants (Coker and Davies, 2003). In the present study, an actin gene (GenBank acc. no. U60480.1) was used as a housekeeping control to confirm the consistency of tissue collection, mRNA extraction, and reverse transcription. Primers

(TGGTCGTACCACCGGTATTGTG and AATGGCATGTGGAAGGGCATAC) were designed to amplify a 91 bp product. The forward primer crossed an intron/exon junction to ensure that genomic DNA was not amplified. Also, no-RT controls were included as negative controls to ensure no contamination by genomic DNA. PCR was performed in an ABI Prism® 7900HT Sequence Detection System (95° for 10 min. followed by 40 cycles of 95° for 15s denaturing and 60° for 60s annealing/extension) using 25 µl 1x

SYBR Green PCR Mastermix (ABI, Foster City, CA, USA), 2 µl cDNA, and 3 µl primers (0.25 µM). Dissociation curve analysis was performed for each sample following PCR. Data were analyzed using ABI SDS software, and quantified relative to the standard curve of a serial dilution.

Relative expression analyses

There are 27 tomato cDNA libraries represented in the TIGR TGI database

(version 9.0) with large sample sizes (>500 ESTs), which were constructed from a variety of tissue types and developmental stages. We searched these 27 libraries for particular

68

EST sequences and calculated relative expression values based on the number of ESTs found in a given population. Analyses were performed as described in Coker et al.

(2003).

Polypeptide sequence analysis

Alignments and other basic sequence analyses were performed using Vector NTI

7.1 (Informax, Bethesda, MD). Searches to determine protein families, domains, and functional sites were performed using the InterPro database (www.ebi.ac.uk/interpro;

Mulder et al., 2003), which integrates PROSITE, Pfam, PRINTS, ProDom, SMART, and

TIGRFAMs. Structural analyses included the prediction of the following: localization signals using TargetP (www.cbs.dtu.dk/services/TargetP/; Emanuelsson et al., 2000); presence and orientation of transmembrane regions using PHDhtm

(http://cubic.bioc.columbia.edu/predictprotein/; Rost et al., 1996), HMMTOP

(http://www.enzim.hu/hmmtop/; Tusnády and Simon, 2001), and the hydropathy index of

Kyte and Doolittle (1982); alpha helices and beta strands using PROFsec

(http://cubic.bioc.columbia.edu/predictprotein/; Rost, 1996); possible interacting proteins using DIP (http://dip.doe-mbi.ucla.edu/; Xenarios et al., 2002); and coiled-coils and leucine zippers using COILS (http://cubic.bioc.columbia.edu/predictprotein/; Lupas,

1996) and 2ZIP (http://2zip.molgen.mpg.de/index.html; Bornberg-Bauer et al., 1998).

The overall strategy for cDNA analyses is outlined in Figure 1.

Acknowledgements

We thank Heike Winter-Sederoff and Raul Salinas for their assistance in accessing equipment, and Sophia Clotho for her advice.

69

Literature Cited

Adams MD, Kerlavage RD, Fleischmann RA, Fuldner CJ, Bult NH, Lee EF, Kirkness KG, Weinstock JD, Gocayne O, White et al. (1995) Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377: 3-17.

Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the Arabidopsis thaliana. Nature 408: 796-815.

Balsera M, Arellano JB, Gutierrez JR, Heredia P, Revuelta JL, De Las Rivas J (2003) Structural analysis of the PsbQ protein of photosystem II by Fourier transform infrared and circular dichroic spectroscopy and by bioinformatic methods. Biochem 42: 1000- 1007.

Becker-Andre M, Schulze-Lefert P, Hahlbrock K (1991) Structural comparison, modes of expression, and putative cis-acting elements of the two 4-coumarate: CoA ligase genes in potato. J Biol Chem 266: 8551-8559.

Bertram L, Lercari B (2000) Phytochrome A and phytochrome B1 control the acquisition of competence for shoot regeneration in tomato hypocotyl. Plant Cell Reports 19: 604- 609.

Bick J-A, Aslund F, Chen Y, Leustek T (1998) Glutaredoxin function for the carboxyl- terminal domain of the plant-type 5’-adenylsulfate reductase. Proc Natl Acad Sci USA 95: 8404-8409.

Bick JA, Setterdahl AT, Knaff, DB, Chen Y, Pitcher LH, Zilinskas BA, Leustek T (2001) Regulation of the plant-type 5'-adenylyl sulfate reductase by oxidative stress. Biochem 40: 9040-9048.

Bobrowicz P, Wysocki R, Owsianik G, Goffeau A, Ulaszewski S (1997) Isolation of three contiguous genes, ACR1, ACR2 and ACR3, involved in resistance to arsenic compounds in the yeast Saccharomyces cerevisiae. Yeast 13: 819-28.

Bond WJ, van Wilgen BW (1996) Fire and plants. Chapman & Hall: London.

Bornberg-Bauer E, Rivals E, Vingron M (1998) Computational approaches to identify leucine zippers. Nucleic Acids Res 26: 2740-2746.

Braam J, Davis RW (1990) Rain-, wind-, and touch-induced expression of calmodulin related genes in Arabidopsis. Cell 60:357-364.

Branen JK, Shintani DK, Engeseth NJ (2003) Expression of antisense acyl carrier protein-4 reduces lipid content in Arabidopsis leaf tissue. Plant Physiol 132: 748-756.

70

Bustin SA (2000) Absolute quantification of mRNA using real-time reverse transcription polymerase chain reaction assays. J Molecular Endocrinology 25: 169-193.

Coker JS, Davies E (2002) Correspondence re: A.H. Ree et al., Expression of a novel factor in human breast cancer cells with metastatic potential (Cancer Res., 59: 4675- 4680, 1999). Cancer Res 62: 4164-4165.

Coker JS, Jones, D, and Davies, E (2003) Identification, conservation, and relative expression of V-ATPase cDNAs in tomato plants. Plant Molecular Biology Reporter 21: 145-158.

Coker JS, Davies E (2003) Selection of candidate housekeeping controls in tomato plants using EST data. Biotechniques 35: 740-748.

Debano LF, Neary DG, Ffolliott PF (1998) Fire effects on ecosystems. Wiley & Sons, Inc.: New York.

Drewa PB, Platt WJ, Moser EB (2002) Fire effects on resprouting of shrubs in headwaters of southeastern longleaf pine savannas. Ecology 83: 755-767.

Edwards K, Cramer CL, Bolwell GP, Dixon RA, Schuch W, Lamb CJ (1985) Rapid transient induction of phenylalanine ammonia-lyase mRNA in elicitor-treated bean cells. Proc Natl Acad Sci USA 82: 6731-6735.

Emanuelsson O, Nielsen H, Brunak S, von Heijne G (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300: 1005-1016.

Espartero J, Pintor-Toro JA, Pardo, JM (1994) Differential accumulation of S- adenosylmethionine synthetase transcripts in response to salt stress. Mol Biol 25: 217- 227.

Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H et al. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296: 92-100.

Graham JS, Hall G, Pearce G, Ryan CA (1986) Regulation of synthesis of proteinase inhibitors I and II mRNAs in leaves of wounded tomato plants. Planta 169: 399-405.

Green TR, Ryan CA (1972) Wound-induced proteinase inhibitor in plant leaves – possible defense mechanism against insects. Science 175: 776-777.

Hagenbuch B, Stieger B, Foguet M, Lubbert H, Meier PJ (1991) Functional expression cloning and characterization of the hepatocyte Na+/bile acid cotransport system. Proc Natl Acad Sci USA 88: 10629-10633.

71

Herde O, Atzorn R, Fisahn J, Wasternack C, Willmitzer L, Peña-Cortes H (1996) Localized wounding by heat initiates the accumulation of proteinase inhibitor II in abscisic acid-deficient plants by triggering jasmonic acid biosynthesis. Plant Physiol 112: 853-860.

Howe GA, Lee GI, Itoh A, Li L, DeRocher AE (2000) Cytochrome P450-dependent metabolism of oxylipins in tomato. Cloning and expression of allene oxide synthase and fatty acid hydroperoxide lyase. Plant Physiol 123: 711-24.

Johnson CH, Kruft V, Subramanian AR (1990) Identification of a plastid-specific ribosomal protein in the 30S subunit of chloroplast ribosomes and isolation of the cDNA clone encoding its cytoplasmic precursor. J Biol Chem 22: 12790-12795.

Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157:105-132.

Lawton MA, Lamb CJ (1987) Transcriptional activation of plant defense genes by fungal elicitor, wounding and infection. Mol Cell Biol 7: 335-341.

Lee D, Douglas CJ (1996) Two divergent members of a tobacco 4-coumarate:coenzyme A ligase (4CL) gene family. cDNA structure, gene inheritance and expression, and properties of recombinant proteins. Plant Physiol: 112: 193-205.

Lee SM, Suh S, Kim S, Crain RC, Kwak JM, Nam HG, Lee YS (1997) Systemic elevation of phosphatidic acid and lysophospholipid levels in wounded plants. Plant J 12: 547-556.

León J, Enrique R, Sánchez-Serrano JJ (2001) Wound signalling in plants. J Exper Bot 52: 1-9.

Low RK, Prakash AP, Swarup S, Goh CJ, Kumar PP (2001) A differentially expressed bZIP gene is associated with adventitious shoot regeneration in leaf cultures of kawakamii. Plant Cell Reports 20: 696-700.

Lu M, Holliday S, Zhang L, Dunn WA, Gluck SL (2001) Interaction between aldolase and vacuolar H+-ATPase. J Biol Chem 32: 30407-30413.

Lupas A (1996) Prediction and analysis of coiled-coil structures. Methods in Enzymology 266: 513-525.

Mayer K, Schuller C, Wambutt R, Murphy G, Volckaert G, Pohl T, et al. (1999) Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana. Nature 402: 769-777.

Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, et al. (2003) The InterPro Database, 2003 brings increased

72

coverage and new features. Nucleic Acids Res 31: 315-318.

Nishiuchi T, Suzuki K, Kitajima S, Sato F, Shinshi H (2002) Wounding activates immediate early transcription of genes for ERFs in tobacco plants. Plant Mol Biol 49: 473-482.

Orozco-Cardenas M, Ryan CA (1999) Hydrogen peroxide is generated systemically in plant leaves by wounding and systemin via the octadecanoid pathway. Proc Natl Acad Sci USA 96: 6553-6557.

Pearce G, Strydom D, Johnson S, Ryan CA (1991) A polypeptide from tomato leaves induces wound-inducible proteinase inhibitor proteins. Science 253: 895-898.

Peña-Cortés H, Wilmitzer L, Sanchez-Serrano J (1991) Abscisic acid mediates wound induction but not developmental-specific expression of the proteinase inhibitor II gene family. Plant Cell 3: 963-972.

Peres LE-P, Morgante PG, Vecchi C, Kraus JE, van Sluys MA (2001) Shoot regeneration capacity from roots and transgenic hairy roots of tomato cultivars and wild related species. Plant Cell Tissue and Organ Culture 65: 37-44.

Platt WJ, Evans GW, Davis MM (1988) Effects of fire season on flowering of forbs and shrubs in longleaf pine forests. Oecologia 76: 353-363

Preston CA, Baldwin IT (1999) Positive and negative signals regulate germination in the post-fire annual, Nicotiana attenuata. Ecology 80: 481-494.

Reymond P, Weber H, Damond M, Farmer EE (2000) Differential gene expression in response to mechanical wounding and insect feeding in Arabidopsis. Plant Cell 12: 707- 719.

Rost B (1996) PHD: predicting one-dimensional protein structure by profile based neural networks. Methods in Enzymology 266: 525-539.

Ryan CA (1974) Assay and biochemical properties of the proteinase inhibitor inducing factor, a wound hormone. Plant Physiol 54: 328-332.

Ryan CA, Farmer EE (1991) Oligosaccharide signals in plants: a current assessment. Annu Rev Plant Physiol Plant Mol Biol 42: 651-674.

Sugihara K, Hanagata N, Dubinsky Z, Baba S, Karube I (2000) Molecular characterization of cDNA encoding oxygen evolving enhancer protein 1 increased by salt treatment in the mangrove Bruguiera gymnorrhiza. Plant Cell Physiol 41: 1279-1285.

Schaller A, Ryan CA (1996) Molecular cloning of a tomato leaf cDNA encoding an aspartic protease, a systemic wound response protein. Plant Mol Biol 31: 1073-1077.

73

Stanković B, Davies E (1996) Both action potentials and variation potentials induce proteinase inhibitor gene expression in tomato. FEBS Lett 390: 275-279.

Stanković B, Vian A, Henry-Vian C, Davies E (2000) Molecular cloning and characterization of a tomato cDNA encoding a systemically wound-inducible bZIP DNA- binding protein. Planta 212: 60-66.

Stennis MJ, Chandra S, Ryan CA, Low PS (1998) Systemin potentiates the oxidative burst in cultured tomato cells. Plant Physiol 117: 1031-1036

Takashina T, Suzuki T, Egashira H, Imanishi S (1998) New molecular markers linked with the high shoot regeneration capacity of the wild tomato species Lycopersicon chilense. Breeding Science 48: 109-113.

Tan X, Varughese M, Widger WR (1994) A light-repressed transcript found in Synechococcus PCC 7002 is similar to a chloroplast-specific subunit protein and to a transcription modulator protein associated with Sigma 54. J Biol Chem 269: 20905- 20912.

Taylor IB (1986) Biosystematics of the tomato. The tomato crop: A scientific basis for improvement. Eds Atherton JG and Rudich J. Chapman and Hall Ltd: New York.

Taylor JLS, van Staden J (1998) Plant-derived smoke solutions stimulate the growth of Lycopersicon esculentum roots in vitro. Plant Growth Regulation 26: 77-83.

Tusnády GE, Simon I (2001) The HMMTOP transmembrane topology prediction server. Bioinformatics 17: 849-850.

USDI-National Park Service (1992) Fire monitoring handbook. Natl Park Serv, Western Region. San Francisco, CA. 134 p. plus appendices.

Van der Hoeven R, Ronning C, Giovannoni J, Martin G, Tanksley S (2002) Deductions about number, organization, and evolution of genes in the tomato genome based on analysis of a large expressed sequence tag collection and selective genomic sequencing. Plant Cell 14: 1441-1456.

Verdaguer D, Ojeda F (2002) Root starch storage and allocation patterns in seeder and resprouter seedlings of two Cape Erica (Ericaceae) species. Amer J Botany 89: 1189- 1196.

Vian A, Henry-Vian C, Schantz R, Ledoigt G, Frachisse JM, Desbiez MO (1996) Is membrane potential involved in calmodulin gene expression after external stimulation in plants? FEBS Lett 380: 93-96.

Vian A, Henry-Vian C, Davies E (1999) Rapid and systemic accumulation of chloroplast

74 mRNA-binding protein transcripts after flame stimulus in tomato. Plant Physiol 121: 517- 524.

Wells BW (2002) The natural gardens of North Carolina. Rev. ed. UNC Press: Chapel Hill.

Wildon DC, Thain JF, Minchin PEH, Gubb IR, Reilly AJ, Skipper YD, Doherty HM, O’Donnell PJ, Bowles DJ (1992) Electrical signaling and systemic proteinase inhibitor induction in the wounded plant. Nature 360: 62-65.

Xenarios I, Salwinski L, Duan XJ, Higney P, Kim S, Eisenberg D (2002) DIP: The Database of Interacting Proteins. A research tool for studying cellular networks of protein interactions. Nucleic Acids Res 30: 303-305.

75

Table I. Sequence extension and polypeptide deduction for unidentifiable tomato cDNA fragments that are "candidates for the systemic wound response" (CSWR). Tentative contigs (labeled as TC) corresponding to each cDNA fragment were identified in the TIGR Tomato Gene Index and used for GenBank searches and polypeptide analyses. Putative protein and cDNA sequences were annotated and deposited into GenBank under the given accession numbers.

Library clone TIGR TGI match GenBank match GenBank submission Length Length Amino acid Amino Name (bp) Acc. # (bp) identity (Blastx) Protein acc. # acids kD pI Protein acc. # CSWR-1 229 TC128541 596 57% (78/136) AAL25091.1 133 14.2 4.9 AY568716 CSWR-2 189 TC116602 1830 83% (389/467) AAB05871.2 461 51.1 6.0 AY568717 CSWR-3 333 TC124827 776 74% (132/178) AAM14118.1 238 25.0 4.8 AY568718 CSWR-4 119 TC116141 966 67% (156/232) S00008 230 24.6 9.6 AY568719 CSWR-5 647 TC124664 1716 77% (319/414) NP_850089.1 407 43.7 8.9 AY568720 CSWR-6 206 TC123928 733 83% (156/186) AAR83862.1 184 20.3 6.1 AY568721 CSWR-7 303 TC117392 1454 63% (181/286) AAK43963.1 311 34.8 5.9 AY568722 CSWR-8 59 TC128899 662 65% (128/196) NP_195379.1 208 22.4 7.0 AY568723 CSWR-9 543 TC121082 702 43% (56/130) ZP_00019821.1 150 16.1 7.1 AY568724

76

Table II. PCR primers specific to 9 novel tomato cDNAs that were used to verify putative open reading frame sequences and perform real-time RT- PCR experiments.

Verification of putative ORF sequence Real time RT-PCR Product Product cDNA Forward primer Reverse primer size (bp) Forward primer Reverse primer size (bp)

CSWR-1 CCATTTCTTCTCTCTGCATTTCTC GCAAAAAGAATTCAATCCAAGACC 548 AAACGGAGGCACTGTGAAGTTG CAACACGAAGGCGGGATGTTC 89

CSWR-2 TGACAAGCAATTTCTTTGCTG TGTCAAAACAATGGTGTGATTG 1500 TGTAAATGGCGCTGCCCAAAC AGGTTCTCAACTCCAGGCTTACTC 99

CSWR-3 ACACAGCCAATCGAAGAAGC CAAGCAAAATGATTTGTCCTAAG 850 CGGGCTTCAACTGACGATTCTG ACCAACCACTTTGGTGTTGTCC 111

CSWR-4 CACCAAAACAAAAAGGTCCTG CAAGTTGAGGCTCTCGGATG 850 TCACTGTCAGAGCCCAACAGG AAGCTGCTTTAGCAACGGAACC 105

CSWR-5 AACGAACCCTGCCCTAAACT CAGCCAAGAAATGCAAACAA 1360 CGGCACTCGGATTTCTACTTGC ACCAAGTGCCATGCAGACAAC 92

CSWR-6 CATTTTCTAGAGAAAGAGCACAAGG CCATGACAGCAAAGACATGC 687 ATGCAACCAGCAGTTGTTCACC AGCTCACCGGACTCATACTTGTTC 111

CSWR-7 CGAGCAACAAAGCAACTGTG CTCTCAAGGAGTGAACATTATGC 1160 TCCGGAATGAGGAAACTGGTGAG TCAACCTCCAAGGGCTCTAACTTC 118

CSWR-8 GGTGAACTTGGTTGAAGCAC TGAAAATCCCCAAACCATTG 590 GATTTAGTTGAGGCGTTGGTGGTG AAACCATTGAGCGTGGTAGTGC 80

CSWR-9 CTCCACCGATGGTGAAAATC CATACCTCGATCTGAACGACAG 531 TTTGGGCACTCGCTTGTCATC TTGAACTCATGGCAGCCACAAC 85

77

Subtractive cDNA library of tomato genes up-regulated during a systemic wound response

Clone isolation and sequencing

Sequence quality control VecScreen Bacterial database searches

Blast searches of GenBank

ESTs ESTs from known Unidentifiable homologous to tomato genes ESTs known genes

Relative expression Sequence analysis using the extension using TIGR TGI the TIGR TGI

Sequence verification (PCR & sequencing)

Real-time RT- Housekeeping Blast searches of PCR (6 hr. controls GenBank timecourse)

PROSITE Protein family Pfam analysis PRINTS ProDom SMART TIGRFAMS Structural analysis

Localization Transmembrane Alpha helices / Interacting Coiled-coils / signals regions Beta sheets proteins leucine zippers

TargetP PHDhtm PROFsec DIP COILS HMMTOP 2ZIP

Figure 1. Strategy to identify and characterize cDNAs up-regulated in tomato leaf tissue during a systemic wound response to fire damage. Clones were sequenced from a subtractive cDNA library and separated into 3 classes based on how well they could be identified by Blast searches. Sequences for 9 unidentifiable cDNA fragments were extended and verified, and used for further Blast searches, protein family analysis, and structural analysis. Expression studies included expressed sequence tag analysis using the TIGR TGI and real-time RT-PCR of leaf tissue over a 6-hour timecourse after flame wounding.

78

1 2 3 4 5 6 7 8 9 10 11 12 13

2000 bp 1200 800 400 200

Figure 2. Confirmation of the existence of 9 putative consensus sequences for unknown tomato cDNAs. PCR was performed using a pooled cDNA sample and the primer pairs shown in Table 2. Products were run on a 2% agarose gel stained with ethidium bromide. Lanes 1 and 13 show 8µl of Low DNA Mass Ladder (Invitrogen), lane 11 is a positive PCR control, and lane 12 is a negative control (no PCR primers). Lanes 2-10 contain PCR products corresponding to the putative open reading frames for CSWR-1 through CSWR-9.

79

0.5 Leaves Shoots 0.4 Flow ers

s Culture/callus T

S 0.3 Fruits Roots

1000 E r e 0.2 # p 0.1

0

r -1 -2 3 -4 -5 6 -7 -8 9 9 o R R R- R R- R R- ct WR WR vg 1- a SW SW SW S SW SW S SW SW A C C C C C C C C C it. F In cDNA

Figure 3. Expressed sequence tag analysis of 9 cDNAs that are candidates for the systemic wound response (CSWR). Each bar represents the relative expression value for a particular gene from cDNA libraries in the TIGR TGI, grouped according to tissue. For example, the TIGR TGI contains 2 CSWR-1 ESTs from leaves, representing 0.1 CSWR-1 ESTs for every 1000 total ESTs. CSWR-4 expression levels for leaves and shoots are off the scale (shown by an arrow) at 2.2 and 1.9, respectively. Translation initiation factor 5A-3 (TIGR acc. no. TC124277) is shown as a ubiquitous cDNA with comparative expression level.

80

Root Stem Leaf CSWR-1

CSWR-2

CSWR-3

CSWR-4

CSWR-5

CSWR-6

CSWR-7

CSWR-8

CSWR-9

Figure 4. Organ-specific relative abundance of CSWR-1 through CSWR-9 in unwounded tomato plants. Each image shows PCR products for a given cDNA using root, stem, and leaf cDNA as template. Each band represents 8 µl of PCR product (35 cycles) on a 1.5% agarose gel stained with ethidium bromide.

81 Pin 1 CSWR-1 0.450 0.700 n n o 0.400 0.600 o ti i t a l a 0.350 l u u 0.500 m m

0.300 u ccu

a 0.400 acc t t 0.250 p p i i 0.200 0.300 scr scr an an 0.150 r 0.200 ve t i ve tr 0.100 t ti a l a

l 0.100 e

0.050 Re R 0.000 0.000 0 60 120 180 240 300 360 0 60 120 180 240 300 360 Time (min.) Time (min.)

CSWR-2 CSWR-3 0.060 0.080 n n o

o 0.070 i ti t 0.050 a a l l u u 0.060 m m 0.040 cu 0.050 accu ac t t p p 0.030 i 0.040 cri scr s 0.030 an

0.020 r tran

e 0.020 ve t v i ti t a a l

l 0.010

e 0.010 e R R

0.000 0.000 0 60 120 180 240 300 360 0 60 120 180 240 300 360 Time (min.) Time (min.)

CSWR-4 CSWR-5 0.450 0.120 n n o

o 0.400 ti ti 0.100 a a l l 0.350 u u m m 0.300 0.080 ccu accu

a t t 0.250 p p i i 0.060 r r 0.200 sc sc an an 0.150 0.040 tr tr e v ve 0.100 ti ti a a 0.020 l l e e 0.050 R R 0.000 0.000 0 60 120 180 240 300 360 0 60 120 180 240 300 360 Time (min.) Time (min.)

CSWR-6 CSWR-7

0.040 0.250 n n o o i

i 0.035 t t a l

la 0.200

0.030 u m mu u cu c

c 0.025 0.150 ac a

t pt p i i 0.020 r r c

s 0.100

n 0.015 ansc a r r t t

e 0.010 ve v i i t

t 0.050 a l la e

e 0.005 R R 0.000 0.000 0 60 120 180 240 300 360 0 60 120 180 240 300 360 Time (min.) Time (min.)

CSWR-8 CSWR-9 0.025 0.010 n n 0.009 o o i ti t a a l 0.020 l 0.008 u u m m 0.007 cu 0.015 0.006 ac t t accu p p i i 0.005 scr 0.010 scr 0.004 an an 0.003 ve tr ve tr i ti 0.005 t 0.002 a a l l e e 0.001 R R 0.000 0.000 0 60 120 180 240 300 360 0 60 120 180 240 300 360 Time (min.) Time (min.)

Figure 5. Systemic transcript accumulation of 9 tomato cDNAs (CSWR-1 through CSWR-9) in leaf 4 after flame wounding leaf 3. Two real-time RT-PCR experiments (solid and dotted lines) were performed on leaf mRNA from 0, 5, 10, 20, 30, 60, 180, and 360 minute timepoints and quantified relative to the standard curve of a serial dilution. Pin 1 (GenBank accession no. K03290) is a well-documented systemic wound gene shown for comparison. Error bars indicate ± standard error (n=2).

82

Phosphopantetheine attachment site (Ser-90) CSWR-1 133 aa Acyl carri er protei n phosphopantetheine dom. Chloro. l.s.

CSWR-2 461 aa Serine-rich reg. Phosphoadenosine phosphosulfate reductase dom. Thioredoxin dom. 2

Chloro. l.s.

CSWR-3 238 aa Chloro. l.s. Transmem. reg.

CSWR-4 230 aa Chloro. l.s. Transmem. reg. Photosys. II O- evolving complex pr ec. Photosys. II O- evolving complex pr ec. Coiled coil (28 bp)

L-307 L-314 L-321 L-328 CSWR-5 407 aa

Mito. or chloro. l.s. Transmembrane regions Sodium bile acid symporter dom.

CSWR-6 184 aa Transmem. reg. Lipoxygenase homology dom. Secr. l.s.

CSWR-7 311 aa Chloro. l.s. Sigma 54 modulation protein dom.

CSWR-8 208 aa Transmem. reg. Transmem. reg. Transmem. reg. Alpha/beta hydrolase dom.

CSWR-9 150 aa HIT family dom. Other l .s.

Figure 6. Structural and functional prediction of 9 tomato proteins, encoded by CSWR-1 through CSWR- 9. White cylinders represent alpha helices, gray ovals represent beta sheets, and block arrows at the amino termini represent localization signals. The locations of other structural elements are shown with black lines beneath each protein. “Other l.s.” refers to a signal peptide localizing somewhere other than chloroplasts, mitochondria, or the secretory pathway. CSWR-1 through CSWR-9 represents GenBank entries AY568716 through AY568724. Abbreviations: l.s., localization signal; Chloro., chloroplast; dom., domain; Mito., mitochondrion; prec., precursor; Secr., secretory pathway (i.e. golgi apparatus or endoplasmic reticulum); transmem. reg., transmembrane region.

83

Chapter 6

Fire Damage Causes the Systemic Up-regulation of a Set of Highly Conserved Transcripts in Tomato Plants

Jeffrey S. Coker, Alan Vian, and Eric Davies

Alan Vian constructed the subtractive cDNA library. Eric Davies provided guidance and editorial assistance.

This chapter will be submitted for publication.

84 Abstract

Fire is a natural component of most terrestrial ecosystems and can act as a local wound stimulus to plants. Nevertheless, there have been no previous attempts to catalogue the array of genes which are up-regulated after fire damage. We have constructed a subtractive cDNA library using PCR-based suppression subtractive hybridization and used it to identify 46 different transcripts which are systemically up- regulated in leaves in the first hour after a distant leaf is flame wounded. Compared with the entire tomato transcriptome, these 46 transcripts are very highly-conserved (slow- evolving) in plants. All but 4 of the identifiable transcripts fall into 5 classes: enzymes of general metabolism; protein synthesis, modification, and transport; transcription; membrane transport; and photosynthesis and respiration. At least half of the up-regulated transcripts have been previously associated with other types of wounds or stresses. These include phenylalanine ammonia-lyase, 4-coumarate:coenzyme A ligase, S-adenosyl-L- homocysteine hydrolase, S-adenosyl-L-methionine synthetase, catalase, leucine aminopeptidase, phantastica, and a metallothionein-like protein. Most of those which have not been associated with other wounding or stress stimuli are associated with photosynthesis and/or respiration. These include pyruvate kinase, rubisco small subunit, chlorophyll a/b binding proteins, and subunits of photosystems I and II.

85 Introduction

Because plants are sessile and cannot escape natural wounding stimuli, they often

respond to tissue damage by changes in gene expression (Graham et al., 1986; Braam and

Davis, 1990; Schaller and Ryan, 1996; León et al., 2001) in both damaged tissues (local

responses) and in undamaged tissues (systemic responses). Many systemic wound genes

have been previously identified in tomato plants including proteinase inhibitors (Green

and Ryan, 1972), systemin (Pearce et al., 1991), aspartic protease (Schaller and Ryan,

1996), allene oxide synthase and fatty acid hydroperoxide lyase (Howe et al., 2000), and

others. Further characterization of the array of systemically up-regulated genes is

necessary to better understand plant defense and stress response mechanisms.

Fire is a wounding and stress stimulus that impacts most terrestrial ecosystems,

and therefore plants have evolved mechanisms to survive it (Bond and van Wilgen, 1996;

DeBano et al., 1998). In fact, some of the most species-rich plant ecosystems (i.e. the

herbaceous groundcover of longleaf pine savannahs) require fire to persist (Platt et al.,

1988; Drewa et al., 2002). Despite the major impacts of fire on plants, responses to fire

damage have not been closely studied on the level of gene expression.

Like many species in the Solanaceae, tomato plants (both wild and cultivated)

possess many characteristics which allow many herbaceous plants to survive fires. These

characteristics include being a perennial (Taylor, 1986), having carbohydrate reserves

stored in underground organs (Peres et al., 2001; Verdaguer and Ojeda, 2002), and the

ability to regenerate shoots from hypocotyls, roots, or other tissues (Takashina et al.,

1998; Bertram and Lercari, 2000; Peres et al., 2001). It has been found that smoke extract stimulates the growth of tomato roots in vitro (Taylor and van Staden, 1998), and

86 that growth of species within the Solanaceae can be regulated by fire regimes (Preston

and Baldwin, 1999). Also, a bZIP gene similar to one we found to be up-regulated by

flame-wounding (Stankovic et al., 2000) has been associated with adventitious shoot

regeneration (Low et al., 2001). Thus, tomato plants are currently the preferred model

system for work on systemic responses to fire damage.

This work presents 46 tomato transcripts which were up-regulated during a

systemic response to fire damage. The transcripts were isolated from a subtractive cDNA

library constructed using PCR-based suppression subtractive hybridization, which is a

powerful method for identifying genes which are differentially expressed between two

tissues (Diatchenko et al., 1999). Two mRNA populations (tester and driver) are

converted to cDNA and hybridized. The hybrid sequences are then removed, leaving

unhybridized cDNAs which represent genes more highly expressed in one of the mRNA

populations (the tester). The differentially expressed cDNAs are then preferentially

amplified by PCR (using tester specific adaptors) to further minimize the chances of

generating false positives. The subtractive cDNA library analyzed here contains cDNAs present at higher levels after flame wounding (tester) than in an unwounded control

(driver). More specifically, it contains transcripts systemically up-regulated in leaf 4 of tomato plants in the first hour after leaf 3 was flamed. Transcripts isolated previously from this library include chloroplast mRNA-binding protein (Vian et al. 1999) and a bZIP

DNA-binding protein (Stanković et al., 2000), as well as an acyl carrier, adenylyl sulfate

reductase, PS II oxygen-evolving complex protein 3, anion:sodium symporter,

chloroplast-specific ribosomal protein, and a histidine triad family protein (Coker et al.,

2004). Here we summarize all unique transcripts isolated from the library, place them

87 into functional categories, assess their extent of conservation in , and discuss how the transcripts compare with those up-regulated by other wounds and stresses.

88 Materials and Methods

Plant material, growth conditions, and tissue collection

Tomato plants (Lycopersicon esculentum cv. Heinz) were obtained from Stokes

Seeds (Buffalo, New York) and grown under controlled conditions in the NCSU

Phytotron on a gravel/Peat-Lite substrate (developed at Cornell University, Ithaca, NY).

Growth chambers maintained an environment of 16 h light (300 µmol s-1 m-2) at 26ºC and 8 h dark at 21ºC. A butane lighter flame was held for 2 seconds 1 cm below the third leaf of 3-4 week old plants (about 12 cm in height with the fourth leaf not fully expanded) causing immediate, localized tissue damage. For construction of the subtractive cDNA library, the fourth leaf was harvested from wounded and control plants

1 hour after wounding and immediately frozen in liquid nitrogen.

Subtractive cDNA library construction, screening and sequencing

A subtractive library was constructed using a PCR-Select cDNA subtraction kit

(Clontech Laboratories, Inc. Palo Alto, CA, USA) such that wound-specific cDNAs were preferentially amplified. Subtraction was performed according to the manufacturer’s recommendations, with only slight modifications as described in Vian et al. (1999). The final PCR-amplified cDNAs were ligated into the T/A vector pT-7 Blue (Novagen,

Madison, WI) for 2 h at room temperature using T4 DNA ligase (Gibco-BRL).

Library clones were grown on LB/ampicillin plates and single colonies picked and grown in LB/ampicillin suspension culture. Most clone cDNAs were prepared for sequencing by PCR amplification (35 cycles; 94° for 45s denaturing, 56° for 60s annealing, 72° for

60s extension) using primers specific for the pT-7 Blue cloning site

(ACCATGATTACGCCAAGCTC and TAAAACGACGGCCAGTGAAT) and purified 89 with a QIAquick PCR Purification kit (Qiagen, Valencia, CA). Other clones were prepared for sequencing using plasmid mini-preps (Qiagen, Valencia, CA). Sequencing was performed in forward (T7 primer) and reverse (pUC/M13 reverse primer) directions using a Beckman/Coulter CEQ2000XL 8-capillary DNA sequencer (dye-terminator chemistry) at the Genomics Core Research Facility of the University of Nebraska-

Lincoln.

DNA sequence analysis

All sequences were screened for vector, primer, and adaptor contamination

(Coker and Davies, 2002) using VecScreen

(http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html). Blast searches were performed in GenBank (version 2.2.7) nucleotide databases using tBlastx and Blastn, and in protein databases using Blastx. Alignments and other basic sequence analyses were performed using Vector NTI 7.1 (Informax, Bethesda, MD).

Comparisons with the Arabidopsis genome

Over 155,000 tomato expressed sequence tags (ESTs) representing a variety of tissues are represented in a single collection in the TIGR Tomato Gene Index (TGI; version 9.0; http://www.tigr.org/tigr-scripts/tgi/T_index.cgi?species=tomato). By comparing tomato ESTs in the TIGR TGI with the Arabidopsis genome (using tBlastx),

Van der Hoeven et al. (2002) divided tomato transcripts into “not homologous” (E value

≥ 0.1), “fast-evolving” (1.0E-15 < E value < 0.1), “intermediate-evolving” (1.0E-50 < E value < 1.0E-15), and “slow-evolving” (E value < 1.0E-50) classes. We identified the

TIGR TGI tentative consensus sequences (Unigenes) used by Van der Hoeven et al.

90 (2002) which corresponded with the cDNAs in our library. We then repeated the methodology of Van der Hoeven et al. (2002) by performing tBlastx searches against the

Arabidopsis genome (www.Arabidopsis.org).

91 Results

Overview of the subtractive cDNA library

Approximately 100 clones were sequenced from the subtractive cDNA library.

After redundant clones and clones representing different parts of the same transcripts

were discounted, there were 46 cDNAs remaining which represent unique transcripts.

This set of unique transcripts is shown in Table 1. The average length of unique cDNAs

from the library was 270 bp, which was expected since it was constructed using a 4-base

restriction enzyme (suggesting an average ~ 256 bp).

Identifications of transcripts in Table 1 were made using Blast searches of

GenBank, and considered putative since many transcripts have not been previously

described in tomato plants. Those which have not been described in any plant are listed

as unknowns. Thirty-six of the 46 transcripts fell into 5 functional classes: enzymes of

general metabolism; protein synthesis, modification, and transport; transcription; membrane transport; and photosynthesis and respiration (Table 1). Five transcripts were placed in an “other” class, and 5 unidentifiable cDNAs were labeled as unknowns (Table

1). The largest functional class in terms of number of unique transcripts was

“photosynthesis and respiration” (14 transcripts). A large number of transcripts were also associated with synthesizing, modifying, and/or transporting RNA and protein (10 transcripts). Perhaps most striking was the presence of 4 out of the 6 key enzymes in phenylpropanoid biosynthesis and the activated methyl cycle (PAL, 4CL, SAHH, and

SAMS), suggesting that the plants were increasing their capacity to make secondary metabolites.

92 Table 1. Summary of a subtractive cDNA library containing transcripts systemically up-regulated in the hour after fire damage.

Functional category Putative identification of cDNA from subtractive library Size (bp) Putative function Previous publication

Enzymes of general Phenylalanine ammonia-lyase (PAL5) 471 Phenylpropanoid metabolism; catalyzes conversion of phenylalanine to cinnamate metabolism 4-coumarate:coenzyme A ligase (4CL-1) 241 Phenylpropanoid metabolism; catalyzes conversion of 4-coumarate to 4-coumaroyl-CoA, etc. S-adenosyl-L-homocysteine hydrolase (SAHH) 113 Activated methyl cycle; catalyzes conversion of S-adenosyl-homocysteine to homocysteine S-adenosyl-L-methionine synthetase (SAMS) 117 Activated methyl cycle; catalyzes conversion of methionine to S-adenosyl-methionine Catalase 464 Degrades hydrogen peroxide Alpha/beta fold family protein 59 Catalytic enzyme, most likely with hydrolase activity Coker et al., 2004 Adenylyl-sulfate reductase 189 Sulfate assimilation Coker et al., 2004 Aspartokinase/homoserine dehydrogenase 121 Amino acid biosynthesis (lysine, threonine, isoleucine, and methionine) Protein synthesis, 30S ribosomal protein S5 (RPS5) 279 Translation; mRNA binding modification, and transport Elongation factor 1-alpha (LeEF-1) 147 Translation; brings aminoacyl-tRNA to the ribosome Leucine aminopeptidase (LAP) 330 Catalyzes the hydrolysis of amino acids from the N terminus of peptides/proteins UDP-glucose:protein transglucosylase (UPTG2) 168 Glycosyltransferase involved in cell wall biosynthesis Glycosyltransferase 314 Transfers oligosaccharides to proteins in the ER to make glycoproteins Chloroplast-specific ribosomal protein 303 Translation in the chloroplast Coker et al., 2004 ARF family GTP-binding protein (ARF1) 357 ADP-ribosylation factor; regulation of vesicle-mediated protein transport Transcription Basic leucine zipper (BZIP) 808 Leucine zipper domain transcription factor (DNA binding protein) Stankovic et al., 2000 Chloroplast mRNA binding protein (CMBP) 521 Allows correct processing of chloroplast mRNAs; forms stem-loop structure within 3'-UTR Vian et al., 1999 Phantastica (PHAN) 180 Myb family transcription factor required for meristem establishment Membrane transport Aquaporin (MIP2) 91 Forms water-selective membrane channels c subunit of V-ATPase (LeVHA-c2) 132 Couple the hydrolysis of ATP to the transport of protons across membranes; alters pH levels Coker et al., 2003 AUX1-like permease (LAX2) 191 Auxin transport Putative anion:sodium symporter 647 Transporting anions with sodium through membranes Coker et al., 2004 Photosynthesis and Pyruvate kinase (cytosolic isozyme) 105 Converts PEP to pyruvate during glycolosis; reaction is the primary regulator of glycolosis respiration Hydroxypyruvate reductase (HPR) 398 Conversion of hydroxypyruvate to glycerate Rubisco small subunit (RBCS) 44 Carboxylation of ribulose-1, 5-bisphosphate (RuBP) Davies et al., 1997 Rubisco activase 252 Activates rubisco by removing RuBP (in the presence of ATP) Plastidic aldolase (AldP) 311 Catalyzes a reaction in the Calvin Cycle Photosystem I subunit precursor 375 Photosystem I polypeptide Photosystem I reaction center subunit 226 Photosystem I polypeptide Chlorophyll a/b-binding protein (similar to CAB-1A) 206 Photosystem I polypeptide Chlorophyll a/b-binding protein (similar to CAB-1B) 208 Photosystem I polypeptide Chlorophyll a/b-binding protein (similar to CAB-1C) 216 Photosystem I polypeptide Chlorophyll a/b-binding protein (CAB-11) 467 Photosystem I polypeptide Chlorophyll b-binding protein (CAB-10B) 329 Photosystem II polypeptide Photosystem II oxygen-evolving complex protein 3 119 Photosystem II polypeptide Coker et al., 2004 Photosystem II 10 kD polypeptide 98 Photosystem II polypeptide Other Leucine-rich repeat (LRP) protein 83 Receptors involved in cell surface recognition of ligands produced by pathogens Histidine triad (HIT) family protein 543 Cell-cycle regulation Coker et al., 2004 Glucosyltransferase (similar to zeatin O-glucosyltransferase) 483 Glycosylation of zeatin (a cytokinin important for protection against cytokinin oxidases) Metallothionein-like protein (LEMT4) 425 Binds heavy metals for uptake and detoxification; may protect cellular consituents from oxidative damage Acyl carrier 229 Shuttles intermediates of type II fatty acid synthase system Coker et al., 2004 Unknown Unknown; similar to 5-hydroxytryptamine receptor in snails 286 ------Unknown; similar to queuine tRNA-ribosyltransferase 150 ------Unknown 333 ------Coker et al., 2004 Unknown 206 ------Coker et al., 2004 Unknown 83 ------

93 Library validation

Several lines of evidence suggest that the transcripts presented here are systemically up-regulated after fire damage. First, the transcripts were isolated from the subtractive cDNA library. Second, Northern blots and real-time RT-PCR experiments for various library clones (using RNA derived from tissue independent of the library) consistently show higher mRNA levels in leaf tissue collected in the hour after wounding than in control tissue (Davies et al., 1997; Vian et al., 1999; Stanković et al., 2000; Coker et al., 2004). Detailed timecourse experiments of mRNA accumulation kinetics have been performed for each transcript with a “previous publication” in Table 1. Therefore, the accumulation of at least 1 transcript from each functional class has been explored in detail. For genes previously examined, the most common pattern of transcript accumulation in leaf 4 of three-week old tomato plants following a flame wound on leaf 3 is an increase that peaks within an hour, followed by a rapid decrease. These rapid changes are then followed by a more gradual period of increased, decreased, or unchanged transcript accumulation (Davies et al., 1997; Vian et al., 1999; Stanković et al., 2000; Coker et al., 2004). Finally, many of the transcripts in Table 1 (and homologues of these transcripts) have been implicated in other types of wound and stress responses in previous studies (see Discussion). Thus, there is substantial evidence that the transcripts presented here are, in fact, systemically up-regulated after a leaf is damaged by fire.

94 Conservation between tomato and Arabidopsis

By comparing tomato transcripts in the TIGR TGI with the Arabidopsis genome

(using tBlastx), Van der Hoeven et al. (2002) divided tomato transcripts into “not

homologous”, “fast-evolving”, “intermediate-evolving”, and “slow-evolving” classes.

The percentage of all tomato transcripts in the TIGR TGI falling into each category is

shown in Figure 1. By repeating their methodology (Blast searching the Arabidopsis genome using the TIGR TGI unigenes which corresponded to our transcripts), we found that 3 (7%), 2 (4%), 10 (22%), and 31 (67%) of the 46 unique transcripts in our library

could be considered as not homologous, fast-evolving, intermediate-evolving, and slow-

evolving, respectively (Figure 1). Therefore, two-thirds of the transcripts in the

subtractive cDNA library are highly conserved (slow-evolving). It follows that fire

damage causes the systemic up-regulation of a set of highly conserved transcripts.

a) b) Not homologous 7% Fast-evolving Slow -evolving 4% Slow -evolving 22% 67% Not homologous 17%

Intermediate- evolving 22% Fast-evolving 24% Intermediate- evolving 37%

Figure 1. Conservation of transcript sequences between tomato and Arabidopsis. a) Entire tomato transcriptome (data from Van der Hoeven et al., 2002). b) Transcripts systemically up-regulated in the hour after fire damage.

95 This high degree of conservation could be related to responses to fire damage (or wounding in general) being ancestral (Bond and van Wilgen, 1996). Most tomato genes involved directly in cell rescue, defense, cell death and aging are not fast evolving as a group (Van der Hoeven et al., 2002).

A potential argument against this conclusion could be that the clones in a cDNA library most likely to be sequenced are those which are highly abundant, and those with high abundance may be highly conserved. However, unlike most cDNA libraries, subtractive libraries constructed using suppression subtractive hybridization have a greatly enhanced presence of low abundance transcripts (Diatchenko et al., 1999).

Furthermore, we have performed EST analysis for the library transcripts and found no evidence that a large proportion of them are highly abundant (Coker et al., 2003; Coker et al., 2004). Finally, most transcripts in each of the 5 functional classes (Table 1) are slow- evolving, and so the trend of high conservation clearly extends beyond just those which might be highly abundant.

96 Discussion

Transcripts common to other wound and stress responses

Since little work has been done on the response to fire damage at the level of gene

expression, a fundamental question is how it compares with responses to other wounds

and stresses. Homologues of at least half of the transcripts reported in Table 1 have been

previously associated with other wounds or stresses.

Eight transcripts from the subtractive cDNA library encoded enzymes of general

metabolism (Table 1), all of these were slow-evolving, and at least 6 have been

associated with other wounds/stresses. The up-regulation of phenylalanine ammonia-

lyase (encoded by PAL5) and 4-coumarate:coenzyme A ligase (encoded by 4CL) has

very important implications during responses to mechanical wounding, herbivory,

dehydration, and pathogen infection (Edwards et al., 1985; Lawton and Lamb, 1987;

Arimura et al., 2000; Reymond et al., 2000). Phenylalanine is a starting material for the

biosynthesis of coumarins, benzoic acid derivatives, lignin, anthocyanins, isoflavones, condensed tannins, simple phenylpropanoids, and other secondary phenolics (Figure 2).

PAL catalyzes the conversion of L-phenylalanine to trans-cinnamic acid, which is the first committed step of phenylpropanoid biosynthesis (Figure 2). The up-regulation of

PAL5 and 4CL is known to be coordinately enhanced by environmental stresses

(Somssich and Hahlbrock, 1998), and often leads to the production of phenolic, defense- related compounds.

97 COOH

NH2 Phenylalanine

PAL

NH3 COOH

Benzoic acid trans-Cinnamic acid derivatives

C4H

Simple COOH phenylpropanoids para-Coumaric acid Coumarins HO

CoA-SH 4CL

Lignin COSCoA precursors para-Coumaroyl CoA HO

Condensed Flavanoids Anthocyanins tannins

Figure 2. Phenylpropanoid biosynthesis from phenylalanine. The key enzymes phenylalanine ammonia-lyase (PAL), cinnamate 4-hydroxylase (C4H), and 4- coumarate:coenzymeA ligase (4CL) are necessary for downstream production of a wide variety of phenolic compounds (indicated in shaded boxes).

S-adenosyl-L-methionine synthetase (SAMS) and S-adenosyl-L-homocysteine hydrolase (SAHH) are enzymes of the activated methyl cycle (Figure 3) which can methylate practically every class of plant metabolite (Moffatt and Weretilnyk, 2001).

The methyl cycle also leads to the biosynthesis of ethylene (Figure 3). SAMS and SAHH are known to be up-regulated in response to mechanical wounding, pathogen infection, herbivory, and salt stress (Kawalleck et al., 1992; Espartero et al., 1994; Arimura et al.,

98 2000; Reymond et al., 2000). For example, Kawalleck et al. (1992) found that fungal elicitor strongly up-regulated both mRNAs in parsley.

(Methionine salvage cycle)

Methionine HMT SAMS

S-adenosyl- ACC syn ACC ox Homocysteine Ethylene methionine ACC

SAHH Furanocoumarins S-adenosyl- homocysteine Methylated products

Figure 3. The methyl cycle and ethylene synthesis. The key enzymes of the methyl cycle, homocysteine S-methyltransferase (HMT), S-adenosyl-L-methionine synthetase (SAMS), and S-adenosyl-L-homocysteine hydrolase (SAHH), promote methylation of a variety of compounds, as well as the production of ethylene through the activity of ACC synthase and ACC oxidase.

Catalase serves as a scavenging enzyme to protect against oxidative damage, and is known to be up-regulated during dehydration and freezing protection (Knight and

Knight, 2001). Similarly, adenylyl-sulfate reductase is up-regulated during oxidative stress, and functions to process sulfates (Bick et al., 2001; Coker et al., 2004).

Seven transcripts from the subtractive cDNA library have been associated with protein synthesis, modification, and transport (Table 1), and 6 of these were slow- evolving. Most notable in this category is leucine aminopeptidase (LAP), which is up-

99 regulated after mechanical wounding, pathogen infection, dehydration, and salt stress (Gu et al., 1996; Chao et al., 1999). Although not generally considered “wound proteins”, transcript levels of ribosomal proteins, EF-1 alpha, and other protein modifiers/transporters are sensitive to certain wounds (Arimura et al., 2000).

Three transcripts from the subtractive cDNA library encoded transcription factors

(Table 1). Two of these were slow-evolving, and all 3 have been previously associated with other wounds or stresses. Basic leucine zippers have been associated with pathogen infection (Jakoby et al., 2002), chloroplast mRNA-binding protein with mechanical wounding (Vian et al., 1999), and phantastica with the feeding sites of root-knot nematodes (Koltai et al., 2001).

Four transcripts from the subtractive cDNA library are associated with membrane transport (Table 1). Three of these were slow-evolving, and homologues of at least 3 have been previously associated with other wounds or stresses. Aquaporins, c subunits of vacuolar ATPase, and the anion:sodium symporter are up-regulated during herbivory

(Arimura et al., 2000), salt stress (Chen et al., 2002), and the presence of toxins

(Bobrowicz et al., 1997), respectively.

Five transcripts from the library do not fit the other functional categories (Table

1). Most notable among these was a metallothioneine-like protein associated with mechanical wounding, herbivory, dehydration, and metal detoxification (Giritch et al.,

1998; Arimura et al., 2000; Reymond et al., 2000). Also, leucine-rich repeat proteins are commonly associated with pathogen infection (Tornero et al., 1996).

In summary, there are many transcripts up-regulated during a systemic response to fire damage similar to those up-regulated in response to other wounds and stresses.

100 Since most transcripts in all 5 functional categories (67% overall) were slow-evolving, it

also appears that these transcripts are highly conserved in plants. Taken together, both

conclusions support the notion that plants respond to multiple wounds/stress stimuli by

common, highly conserved mechanisms.

Transcripts not common to other wound and stress responses

Most previous studies show that photosynthetic genes such as rubisco small

subunit are unaffected or down-regulated by wounding, stress, or pathogen infection

(Hermsmeier et al., 2001). This down-regulation is associated with a shift of carbon from

primary metabolism to defense. In the current investigation of the systemic response to

fire damage, however, 14 transcripts from the subtractive cDNA library encoded proteins

involved in photosynthesis and respiration (Table 1). The systemic accumulation of

rubisco small subunit and photosystem II oxygen-evolving complex protein 3 after fire

damage has been described previously (Davies et al., 1997; Coker et al., 2004). It may also be notable that an enzyme of general metabolism in this study which has not been previously associated with wound/stress responses, aspartokinase/homoserine dehydrogenase, is regulated by photosynthetic-related signals (Zhu-Shimoni and Galili,

1998).

It seems evident that fire damage could provoke a very different response than other wounds with regard to energy metabolism. For example, it is possible that oxidative damage and leaf tissue damage caused by fire decreases photosynthetic capacity, which must then be restored. In the natural environment, fire damage is very different from pathogen attack or herbivory in that photosynthesis and growth can be of

101 immediate importance (perhaps to replace leaves). It is also not unprecedented for photosynthetic genes to play a role in a stress response. For example, homologues of 3 genes in this study (plastidic aldolase, photosystem II OEC protein 3, and rubisco activase) have been associated with responses to salt stress (Yamada et al., 2000;

Sugihara et al., 2000; Gu et al., 2004). The systemic up-regulation of photosynthetic genes after fire damage raises a very important question: Are photosynthetic genes necessary components of the early response to fire damage, or is their up-regulation merely the result of a perturbation in an interconnected network? This cannot yet be answered, and will be an interesting topic for future study.

102 References

Arimura G, Tashiro K, Kuhara S, Nishioka T, Ozawa R, Takabayashi J (2000) Gene responses in bean leaves induced by herbivory and by herbivore-induced volatiles. Biochem Biophys Res Commun 277: 305-310.

Bertram L, Lercari B (2000) Phytochrome A and phytochrome B1 control the acquisition of competence for shoot regeneration in tomato hypocotyl. Plant Cell Reports 19: 604- 609.

Bick JA, Setterdahl AT, Knaff, DB, Chen Y, Pitcher LH, Zilinskas BA, Leustek T (2001) Regulation of the plant-type 5'-adenylyl sulfate reductase by oxidative stress. Biochem 40: 9040-9048.

Bobrowicz P, Wysocki R, Owsianik G, Goffeau A, Ulaszewski S (1997) Isolation of three contiguous genes, ACR1, ACR2 and ACR3, involved in resistance to arsenic compounds in the yeast Saccharomyces cerevisiae. Yeast 13: 819-28.

Bond WJ, van Wilgen BW (1996) Fire and plants. Chapman & Hall: London.

Braam J, Davis RW (1990) Rain-, wind-, and touch-induced expression of calmodulin related genes in Arabidopsis. Cell 60:357-364.

Chao WS, Gu Y-Q, Pautot V, Bray EA, Walling LL (1999) Leucine aminopeptidase RNAs, proteins, and activities increase in response to water deficit, salinity, and wound signals systemin, methyl jasmonate, and abscisic acid. Plant Physiol 120: 979-992.

Chen X, Kanokporn T, Zeng Q, Wilkins TA, Wood AJ (2002) Characterization of the V- type H(+)-ATPase in the resurrection plant Tortula ruralis: accumulation and polysomal recruitment of the proteolipid c subunit in response to salt-stress. J Exp Bot 53: 225-32.

Coker JS, Davies E (2002) Correspondence re: A.H. Ree et al., Expression of a novel factor in human breast cancer cells with metastatic potential (Cancer Res., 59: 4675- 4680, 1999). Cancer Res 62: 4164-4165.

Coker JS, Jones, D, and Davies, E (2003) Identification, conservation, and relative expression of V-ATPase cDNAs in tomato plants. Plant Molecular Biology Reporter 21: 145-158.

Coker JS, Vian A, Davies E (2004) Identification, accumulation, and functional prediction of novel tomato transcripts systemically up-regulated after fire damage. Submitted.

Davies E, Vian A, Vian C, Stankovic B (1997) Rapid systemic up-regulation of genes after heat-wounding and electrical stimulation. Acta Physiologiae Plantarum 4: 571-576.

103

Debano LF, Neary DG, Ffolliott PF (1998) Fire effects on ecosystems. Wiley & Sons, Inc.: New York.

Diatchenko L, Lukyanov S, Lau YF, Siebert PD (1999) Suppression subtractive hybridization: a versatile method for identifying differentially expressed genes. Methods Enzymol 303: 349-80.

Drewa PB, Platt WJ, Moser EB (2002) Fire effects on resprouting of shrubs in headwaters of southeastern longleaf pine savannas. Ecology 83: 755-767.

Edwards K, Cramer CL, Bolwell GP, Dixon RA, Schuch W, Lamb CJ (1985) Rapid transient induction of phenylalanine ammonia-lyase mRNA in elicitor-treated bean cells. Proc Natl Acad Sci USA 82: 6731-6735.

Espartero J, Pintor-Toro JA, Pardo, JM (1994) Differential accumulation of S- adenosylmethionine synthetase transcripts in response to salt stress. Mol Biol 25: 217- 227.

Giritch A, Ganal M, Stephan UW, Baumlein H (1998) Structure, expression and chromosomal location of the metallothionein-like gene family of tomato. Plant Molecular Biol 37: 701-714.

Graham JS, Hall G, Pearce G, Ryan CA (1986) Regulation of synthesis of proteinase inhibitors I and II mRNAs in leaves of wounded tomato plants. Planta 169: 399-405.

Green TR, Ryan CA (1972) Wound-induced proteinase inhibitor in plant leaves – possible defense mechanism against insects. Science 175: 776-777.

Gu YQ, Chao WS, Walling LL (1996) Localization and post-translational processing of the wound-induced leucine aminopeptidase proteins of tomato. J Biol Chem 271: 25880- 25887.

Gu R, Fonseca S, Puskas LG, Hackler L Jr, Zvara A, Dudits D, Pais MS (2004) Transcript identification and profiling during salt stress and recovery of Populus euphratica. Physiol 24:265-76.

Hermsmeier D, Schittko U, Baldwin IT (2001) Molecular interactions between the specialist herbivore Manduca sexta (Lepidoptera, Sphingidae) and its natural host Nicotiana attenuata. I. Large-scale changes in the accumulation of growth- and defense- related plant mRNAs. Plant Physiol 125: 683-700.

Howe GA, Lee GI, Itoh A, Li L, DeRocher AE (2000) Cytochrome P450-dependent metabolism of oxylipins in tomato. Cloning and expression of allene oxide synthase and fatty acid hydroperoxide lyase. Plant Physiol 123: 711-24.

104

Jakoby M, Weisshaar B, Droge-Laser W, Vicente-Carbajosa J, Tiedemann J, Kroj T, Parcy F (2002) bZIP transcription factors in Arabidopsis. Trends in Plant Science 7:106- 111.

Kawalleck P, Plesch G, Hahlbrock K, Somssich IE (1992) Induction by fungal elicitor of S-adenosyl-L-homocysteine hydrolase mRNAs in cultured cells and leaves of Petroselium crispum. Proc Natl Acad Sci USA 89: 4713-4717.

Knight H, Knight MR (2001) Abiotic stress signaling pathways: specificity and cross- talk. Trends in Plant Science 6: 262-267.

Koltai H, Dhandaydham M, Opperman C, Thomas J, Bird D (2001) Overlapping plant signal transduction pathways induced by a parasitic nematode and a rhizobial endosymbiont. Mol Plant Microbe Interact 14:1168-77.

Lawton MA, Lamb CJ (1987) Transcriptional activation of plant defense genes by fungal elicitor, wounding and infection. Mol Cell Biol 7: 335-341.

León J, Enrique R, Sánchez-Serrano JJ (2001) Wound signalling in plants. J Exper Bot 52: 1-9.

Low RK, Prakash AP, Swarup S, Goh CJ, Kumar PP (2001) A differentially expressed bZIP gene is associated with adventitious shoot regeneration in leaf cultures of Paulownia kawakamii. Plant Cell Reports 20: 696-700.

Moffatt BA, Weretilnyk EA (2001) Sustaining S-adenosyl-L-methionine-dependent methyltranserase activity in plant cells. Physiologia Plantarum 113: 435-442.

Pearce G, Strydom D, Johnson S, Ryan CA (1991) A polypeptide from tomato leaves induces wound-inducible proteinase inhibitor proteins. Science 253: 895-898.

Peres LE-P, Morgante PG, Vecchi C, Kraus JE, van Sluys MA (2001) Shoot regeneration capacity from roots and transgenic hairy roots of tomato cultivars and wild related species. Plant Cell Tissue and Organ Culture 65: 37-44.

Platt WJ, Evans GW, Davis MM (1988) Effects of fire season on flowering of forbs and shrubs in longleaf pine forests. Oecologia 76: 353-363

Preston CA, Baldwin IT (1999) Positive and negative signals regulate germination in the post-fire annual, Nicotiana attenuata. Ecology 80: 481-494.

Reymond P, Weber H, Damond M, Farmer EE (2000) Differential gene expression in response to mechanical wounding and insect feeding in Arabidopsis. Plant Cell 12: 707- 719.

105 Schaller A, Ryan CA (1996) Molecular cloning of a tomato leaf cDNA encoding an aspartic protease, a systemic wound response protein. Plant Mol Biol 31: 1073-1077.

Somssich IE, Hahlbrock K (1998) Pathogen defence in plants – a paradigm of biological complexity. Trends in Plant Science 3: 86-90.

Stanković B, Vian A, Henry-Vian C, Davies E (2000) Molecular cloning and characterization of a tomato cDNA encoding a systemically wound-inducible bZIP DNA- binding protein. Planta 212: 60-66.

Sugihara K, Hanagata N, Dubinsky Z, Baba S, Karube I (2000) Molecular characterization of cDNA encoding oxygen evolving enhancer protein 1 increased by salt treatment in the mangrove Bruguiera gymnorrhiza. Plant Cell Physiol 41: 1279-1285.

Takashina T, Suzuki T, Egashira H, Imanishi S (1998) New molecular markers linked with the high shoot regeneration capacity of the wild tomato species Lycopersicon chilense. Breeding Science 48: 109-113.

Taylor IB (1986) Biosystematics of the tomato. The tomato crop: A scientific basis for improvement. Eds Atherton JG and Rudich J. Chapman and Hall Ltd: New York.

Taylor JLS, van Staden J (1998) Plant-derived smoke solutions stimulate the growth of Lycopersicon esculentum roots in vitro. Plant Growth Regulation 26: 77-83.

Tornero P, Mayda E, Gomez M, Canas L, Conejero V, Vera P (1996) Characterization of LRP, a leucine-rich repeat (LRR) protein from tomato plants that is processed during pathogenesis. Plant J 10: 315-330.

Van der Hoeven R, Ronning C, Giovannoni J, Martin G, Tanksley S (2002) Deductions about number, organization, and evolution of genes in the tomato genome based on analysis of a large expressed sequence tag collection and selective genomic sequencing. Plant Cell 14: 1441-1456.

Verdaguer D, Ojeda F (2002) Root starch storage and allocation patterns in seeder and resprouter seedlings of two Cape Erica (Ericaceae) species. Amer J Botany 89: 1189- 1196.

Vian A, Henry-Vian C, Davies E (1999) Rapid and systemic accumulation of chloroplast mRNA-binding protein transcripts after flame stimulus in tomato. Plant Physiol 121: 517- 524.

Yamada S, Komori T, Hashimoto A, Kuwata S, Imaseki H, Kubo T (2000) Differential expression of plastidic aldolase genes in Nicotiana plants under salt stress. Plant Sci 154: 61-69.

106 Zhu-Shimoni JX, Galili G (1998) Expression of an Arabidopsis aspartate kinase/homoserine dehydrogenase gene is metabolically regulated by photosynthesis- related signals but not by nitrogenous compounds. Plant Physiol 116: 1023-1028.

107

Chapter 7

Conclusions and Future Directions

108 Conclusions and future directions regarding the development of methods for gene expression analysis using sequence data: Blueprint for a universal sequencing-based method of gene expression analysis

Abstract

Modern methods of measuring gene expression rely upon complementary or antibody binding, followed by some measurement of radiation emitted or absorbed by a large population of molecules. These indirect measurements of gene expression have many limitations which are inherent to all binding-radiation methods. A direct measurement of gene expression would be to sequence individual cDNA molecules to establish actual numbers of molecules present. This section presents the advantages of sequencing-based measurements of gene expression and offers a set of specifications upon which future analyses could be based. The advantages of a direct sequencing method include integration of gene discovery with expression analyses, universal comparability between transcript frequencies, and universal comparability between experiments. Currently, the major obstacles for a universal sequencing-based method for gene expression analysis involve 1) technology for sequencing individual cDNA molecules 2) sequence quality-control and 3) methods for data analysis. To elucidate what advances are needed to overcome these obstacles, specifications and a blueprint for a system which would allow gene expression analysis in any species are provided.

109 Disadvantages of binding-radiation methods

Microarrays, northern and western blots, ELISA, RT-PCR, and the other popular

methods for measuring gene expression have two common traits. First, they all rely upon complementary binding (for measuring RNA or cDNA) or binding to an antibody (for measuring proteins) for specificity. Second, they depend upon some measurement of electromagnetic radiation (some wavelength of light), which is (in theory) proportional to the amount of cDNA, RNA, or protein present. Both characteristics allow one to conclude that all of these methods are indirect – they do not directly measure the number of cDNA, RNA, or protein molecules. In other words, binding-radiation methods measure an amount of radiation associated with the binding capacity of large populations of molecules, but do not find exact numbers of individual molecules.

Two sets of disadvantages are associated with binding-radiation methods. The first set is related to binding. Indirect methods require hybridization probes and antibodies for every gene in a species. Generating probes and antibodies is time/labor intensive and prone to error, making it practical only for economically important species.

Furthermore, because all probes and antibodies are different, their binding efficiencies are different. Therefore, it is not usually possible to make quantitative comparisons between the expression levels of different genes, or even comparisons using the same gene when different probes/antibodies are used. By contrast, direct methods require one to isolate RNA (or protein), convert RNA to cDNA, perhaps clone the cDNA (depending upon the sequencing technology), and then perform sequencing. Therefore, after RNA

(or protein) has been isolated, there could be one protocol suitable for measuring the expression of every gene for every species, and one software tool to analyze the data.

110 The second set of disadvantages associated with binding-radiation methods is related to detection of the light/radiation. Indirect methods usually have an upper limit of quantification due to saturation (too much light to measure accurately) and/or a lower limit due to a threshold of detection (too little light to measure accurately, if at all). All such problems are eliminated by direct methods. In other words, direct methods will reliably quantify the expression of both low and high-abundance transcripts (and proteins).

111 Advantages of sequencing methods

The direct method of measuring gene expression would be to sequence the individual transcript (or protein) molecules of a given cell or tissue and find the number of individual transcripts. Thus, by sequencing each transcript (or protein), one attains a direct, absolute measurement of gene expression. Sequencing individual full-length transcripts (or proteins) from a given cell or tissue represents the ultimate measurement of gene expression. Because of the increased specificity and precision of sequencing data, direct methods will be more reliable for measuring gene expression in the following scenarios:

a) Comparing homologous genes in the same species. b) Comparing homologous genes in different species. c) Comparing splice variants of the same gene. d) Identifying SNPs and/or unexpected sequence variations between individuals. d) Measuring small variations in gene expression. e) Any other experiment where exact quantification (number of a particular transcript in a population of transcripts) is needed.

The indirect binding-radiation methods are already widely used, whereas the direct sequencing methods are used less because of the time and high costs needed to sequence thousands of cDNAs (or other type of molecule). However, sequencing costs are decreasing and technology is being developed so that individual DNA molecules can be sequenced (Braslavsky et al., 2003). In the coming decades, advances in nanotechnology are likely to make direct sequencing methods a reality and indirect binding-radiation methods obsolete for reasons given below.

Integration of gene discovery with expression analyses

112 Gene discovery and measurement of gene expression are currently two distinct

steps. In general, sequencing is the preferred method of gene discovery, whereas

binding-radiation methods are the preferred methods of measuring gene expression. To

design an experiment to measure gene expression using binding-radiation methods, one

must already know something about the gene to be measured. Using sequencing methods

(i.e. sequencing a cDNA library), however, one needs no prior knowledge of the genes in

question – both gene discovery and measurement of gene expression occur using one

method.

Universal comparability between transcript frequences

As stated above, differences in binding efficiencies prevent direct, quantitative comparisons between expression levels of different genes. Figure 1 illustrates the number of direct comparisons which could be made between 2 transcript populations using binding-radiation and sequencing methods. For binding-radiation methods, binding efficiencies and light emission may be different for different transcripts, allowing direct comparisons to be made only for identical transcripts (i.e. comparing transcript A in population 1 with transcript A in population 2). On the other hand, sequencing methods generate transcript frequencies which allow direct comparisons between any 2 transcripts.

In the example using only 3 transcripts from 2 transcript populations in Figure 1, a

binding-radiation method would allow only 3 direct comparisons, while a sequencing

method would allow 15 comparisons.

113

a) Binding-radiation method b) Sequencing method

Population 1 Population 2 Population 1 Population 2

A A A A

B B B B

C C C C

Figure 1. Comparisons that can be made between 2 transcript populations using binding-radiation (a) and sequencing (b) methods. Each line represents a reliable comparison between transcript levels. Binding-radiation methods allow reliable comparisons only between identical transcripts, whereas sequencing methods allow comparisons between all transcripts.

The mathematical expression for the number of comparisons which could be made using a sequencing method is described by

n2 − n C = [Eq. 1] 2

where C is the number of possible comparisons and n is the total number of transcript frequencies in all populations. For example, a modern microarray experiment using 2 gene chips with 30,000 probes each allows 30,000 comparisons to be made. A sequencing method which generated data for the same 30,000 transcripts from the same 2 transcript populations would allow 1.8 × 109 comparisons (where n=30,000 x 2). Thus, the potential usefulness of the data increases 60,000–fold. This number is actually an underestimate, since direct comparisons between groups of transcripts would also be possible (i.e. all actin genes, all photosynthetic genes, etc.).

114 The scale of complexity of this problem in plants may be visualized by an analogy

using plant ecology in the United States. There are approximately 30,000 plant species

and 50 states in the U.S. Similarly, there are approximately 30,000 genes and 50 cell

types in higher plants. Therefore, in terms of complexity, comparing the numbers of all

transcripts between 2 cell types is analogous to comparing the numbers of all species

between 2 U.S. states (equating to 1.8 × 109 possible comparisons). Comparing transcript

populations for all 50 cell types (or species in 50 states) would result in 1.125 × 1012 possible comparisons. Thus, monumental advances in bioinformatics and computational biology will be necessary to deal with the vast amount of data generated by sequence- based expression studies.

Universal comparability between experiments

A common problem with binding-radiation methods is the difficulty in comparing

one experiment with another. As stated above, 2 different hybridization probes for the

exact same gene often do not allow a direct, quantifiable comparison. The result is that

the number of comparisons which can be made is restricted to the number of experiments

in a particular laboratory (or sometimes using a particular method). For example, using

binding-radiation methods, it is usually difficult (or impossible) for a laboratory which

has quantified expression levels for 3 transcripts to make direct comparisons with another

laboratory which quantified the same 3 transcripts in a different tissue. On the other

hand, sequencing methods allow such comparisons and would therefore facilitate

universal comparability within the literature and gene expression databases using

115 standard units such as “number of a particular transcript / number of total transcripts” or

“number of a particular transcript / cell”.

116 Obstacles and specifications for a universal sequencing-based method

Despite the future potential of sequencing-based methods for gene expression analysis, binding-radiation methods are currently more practical and robust for most experimental questions. Because sequencing is currently slow and expensive, it is impractical for many labs to sequence enough to complete an entire gene expression study. For tomato plants, which are among the most economically important species, there are 27 cDNA libraries publicly available (in the TIGR Tomato Gene Index) with more than 500 ESTs. Therefore, although there are enough data to perform gene expression analyses using sequence data, the number of experimental questions which can be addressed is limited.

Currently, there are 3 major obstacles impeding the use of sequencing for studies of gene expression (Table 1): 1) Technology for sequencing individual cDNA molecules

2) Sequence quality control 3) Methods for data analysis. With regard to the first obstacle, it has been shown that it is possible to sequence an individual DNA molecule

(Braslavsky et al., 2003). However, the current method is successful for less than 10 nucleotides at a time. For single-molecule sequencing to be useful, full-length cDNA molecules must be sequenced rapidly, demanding significant advances in sequencing technology (Table 1).

The second obstacle involves problems which are currently plaguing genome projects, transcriptome projects, and public sequence databases – sequence quality control (Table 1). Identifying false sequences (vectors, primers, adaptors, DNA from other organisms, etc.) in a sequencing project has proven to be a formidable task. The three main reasons for this are 1) Contaminating sequences are often very short (<20 bp)

117 2) the full DNA sequences of only a few organisms are known, and so identifying transcripts as “native” or “contaminating” is often challenging and 3) A wide variety of molecular rearrangements (due to transposons, adaptor dimerization, etc.) can take place during the creation of a DNA library. When using sequencing for gene expression analysis, bad sequence quality control could lead to miscalculating the total number of transcripts in a population, and worse, associating the wrong cDNA with an organism.

Currently, there is no tool that will reliably identify all types of contaminating DNA sequences. To eliminate such problems, an “EST Quality Algorithm” (EQUAL) must be devised and linked to public sequence databases such as GenBank and the TIGR Gene

Indices.

Table 1. Specifications for a universal sequencing-based method of gene expression analysis. Specifications are based upon the 3 major obstacles shown in the left-hand column. A method which fulfilled the proposed specifications would allow for rapid gene discovery and gene expression analyses in any species. Significant advances are necessary for each specification to become a practical reality.

Obstacle Specifications for a universal sequencing- References based method I. Technology for 1. Able to sequence individual cDNA molecules Braslavsky et al., 2003 sequencing individual cDNA molecules 2. Able to sequence individual full-length cDNA (none) molecules 3. Able to sequence a large number of individual (none) full-length cDNA molecules rapidly II. Sequence quality 4. High sequence quality (Phred scores) for the Ewing and Green, 1998 control entire length of cDNA molecules 5. Able to distinguish cDNA of one organism See Chapter 2 from that of pathogens, etc. 6. Able to distinguish genuine cDNA from See Chapter 2 cloning artifacts (vectors, primers, adapters, etc.). III. Methods for data 7. Clustering algorithms Pertea et al., 2003 analysis 8. Statistics (and visualization tools) to compare Stekel et al., 2000 transcript frequencies Romualdi et al., 2001 9. Methods for evaluating internal controls See Chapter 3 and/or housekeeping controls (Coker and Davies, 2003) 10. Public database of expression data which is Quackenbush et al., 2001 integrated with traditional sequence databases.

118

The third obstacle, methods for data analysis, is currently being addressed in the context of cDNA library analysis (Table 1). To allow uniformity in the treatment of data among researchers, algorithms for clustering identical/similar transcripts, computing transcript frequencies, comparing transcript frequencies, ensuring sound internal/housekeeping controls, and storing and searching sequences must be brought together into one integrated “EST Analysis Algorithm (EANAL). To be most useful,

EANAL must be linked to public sequence databases such as GenBank and the TIGR

Gene Indices.

Based on the major obstacles stated above, a basic set of specifications emerge

(Table 1) which would lead to a universal sequencing-based method of gene expression analysis as shown in generalized form in Figure 2.

I. DNA sequencing microchip

Purifier (to filter non-DNA) Well for application of cDNA sample

Separator (of individual cDNAs)

Stabilizer (of individual cDNA molecules)

Microprocessor Sequencer

Sequence data

II. EST quality algorithm (EQUAL)

III. EST analysis algorithm (ENEAL)

Figure 2. Theoretical blueprint for a universal sequencing-based method of gene expression analysis.

119 References

Braslavsky I., Herbert B., Kartalov E., Quake S.R. 2003. Sequence information can be obtained from single DNA molecules. Proc. Natl. Acad. Sci. 100: 3960-3964.

Coker J.S., Davies E. 2003. Selection of candidate housekeeping controls in tomato plants using EST data. Biotechniques 35: 740-748.

Ewing B., Green P. 1998. Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 8: 186-194.

Pertea G., Huang X., Liang F., Antonescu V., Sultana R., Karamycheva S., Lee Y., White J., Cheung F., Parvizi B., Tsai J., Quackenbush J. 2003. TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19: 651-2.

Romualdi C., Bortoluzzi S., Danieli G.A. 2001. Detecting differentially expressed genes in multiple tag sampling experiments: comparative evaluation of statistical tests. Hum. Mol. Genet. 10: 2133-2141.

Stekel D.J., Git Y., Falciani F. 2000. The comparison of gene expression from multiple cDNA libraries. Genome Res. 10: 2055-2061.

Quackenbush J., Cho J., Lee D., Liang F., Holt I., Karamycheva S., Parvizi B., Pertea G., Sultana R., White J. 2001. The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 29: 159-64.

120 Conclusions and future directions regarding the biology of systemic responses to fire damage

The overall goal of this dissertation was to characterize the array of transcripts which systemically accumulate after fire damage. Several conclusions regarding the biology of systemic responses to fire damage have resulted.

After a tomato leaf is damaged by fire, many different transcripts accumulate in other parts of the plant. Most of these transcripts are highly conserved in plants, suggesting that the observed systemic response to fire damage is not unique to tomato plants and could even be a universal phenomenon in higher plants. Most of the transcripts fall into 5 functional classes: 1) enzymes of general metabolism; 2) protein synthesis, modification, and transport; 3) transcription; 4) membrane transport; 5) photosynthesis and respiration. Most of the transcripts were already present in unwounded tissues, but at lower levels than after wounding. After wounding, the accumulation of most transcripts peaked within 30 to 60 minutes, followed by a return to basal levels within 3 hours.

The systemic response to fire damage has components similar to those of other wound and stress responses. These include 4 of the 6 key enzymes of phenylpropanoid biosynthesis and the activated methyl cycle, proteinase inhibitors, leucine aminopeptidase, and many others. These common components suggest that there is some universality in plant responses to different types of wounding and/or stress. On the other hand, the systemic response to fire damage has components different from those of other wound responses. Most notable among these were transcripts associated with photosynthesis and respiration. It is unclear if the accumulation of photosynthesis

121 transcripts is just a perturbation in an interconnected genetic network, or if it indicates a

fundamental difference in the response to fire damage versus other wound responses.

Based on these conclusions, there are several future directions for studies on the

systemic response to fire damage (flame wounding) in plants. Most importantly, the 46

transcripts which have been shown to accumulate after fire damage should be tested for

functionality. Do their corresponding polysomal mRNA levels increase? Do their

corresponding protein levels increase? If polysomal mRNA and/or protein levels do increase, then how do their kinetics compare with the accumulation of total mRNA?

What are these proteins doing? Are the transcripts associated with photosynthesis, in particular, being used to make functional proteins?

The second future direction is to extend the studies of gene expression in this dissertation to include root tissues. In natural ecosystems, fire often burns most or all of the aboveground mass of a plant. Therefore, it makes sense that a fundamental part of the systemic response to fire damage might include gene expression in the roots. Do the 46 transcripts which accumulate in leaves also accumulate in roots?

The third future direction is to understand how the entire plant transcriptome changes after flame wounding. The subtractive cDNA library allowed the discovery of up-regulated genes, but cannot tell us how many genes are up-regulated. Nor can the subtractive library tell us how many genes are down-regulated. Microarray experiments using tomato or Arabidopsis thaliana would provide a global perspective of transcript

changes in plants. Microarray experiments could also be used to answer parts of the first

and second future directions above (accumulation of polysomal mRNA and transcript

accumulation in roots).

122 Because fire in natural ecosystems is varied in its intensity, speed, and chemical composition, laboratory experiments should eventually test how biological responses change as the stimulus (fire) changes. How does systemic transcript accumulation change when a leaf is charred? Do different types of fires evoke different types of responses? Can smoke by itself cause changes in gene expression?

The final future direction must be to take the knowledge gained from tomato plants in a laboratory setting and apply it to studying plants in the natural environment.

Even though the laboratory is powerful in its ability to separate one variable affecting a plant from conflicting variables, we can never assume we know the full story until we study plants in their own environment. There seems little doubt that responses to fire damage in nature are far richer in complexity than we could ever see using a single model organism in a laboratory. If only to appreciate the full scope and beauty of this complexity, physiological studies of fire damage in nature should be pursued.

123

Appendices

Appendix 1 was published electronically in 2003 as Supplementary Materials for Chapter 4.

Appendix 2 contains GenBank entries associated with Chapter 5 which will be publicly released on September 1, 2004.

Appendix 3 consists of educational research completed over the course of this dissertation. Appendix 3-A was published in 2002 in the Journal of Natural Resources and Life Science Education 31, 44-47, Appendix 3-B will be submitted for publication, and Appendix 3-C has been submitted for publication. In addition, curricula associated with this dissertation have been published electronically in 2003 in the Genetics section of the Biology Lab Clearinghouse (http://blc.biolab.udel.edu/Coker-Davies/).

124

Appendix 1: V-ATPase amino acid alignments in Lycopersicon and Arabidopsis

Supplementary materials for Coker, Jones, and Davies (2003)

1 50 LeVHA-c1 -MSNFAGDETAPFFGFLGAAAALVFSCMGAAYGTAKSGVGVASMGVMRPE LeVHA-c2 MASTFSGDETAPFFGFLGAAAALVFSCMGAAYGTAKSGVGVASMGVMRPE LeVHA-c3 -MSNFAGDETAPFFGFLGAAAALVFSCMGAAYGTAKSGVGVASMGVMRPE LeVHA-c4 MASTFSGDETAPFFGFLGAAAALVFSCMGAAYGTAKSGVGVASMGVMRPE Consensus MMSTFAGDETAPFFGFLGAAAALVFSCMGAAYGTAKSGVGVASMGVMRPE 51 100 LeVHA-c1 LVMKSIVPVVMAGVLGIYGLIIAVIISTGINPKTKSYYLFDGYAHLSSGL LeVHA-c2 LVMKSIVPVVMAGVLGIYGLIIAVIISTGINPKTKSYYLFDGYAHLSSGL LeVHA-c3 LVMKSIVPVVMAGVLGIYGLIIAVIISTGINPKTKSYYLFDGYAHLSSGL LeVHA-c4 LVMKSIVPVVMAGVLGIYGLIIAVIISTGINPKTKSYYLFDGYAHLSSGL Consensus LVMKSIVPVVMAGVLGIYGLIIAVIISTGINPKTKSYYLFDGYAHLSSGL 101 150 LeVHA-c1 ACGLAGLSAGMAIGIVGDAGVRANAQQPKLFVGMILILIFAEALALYGLI LeVHA-c2 ACGLAGLSAGMAIGIVGDAGVRANAQQPKLFVGMILILIFAEALALYGLI LeVHA-c3 ACGLAGLSAGMAIGIVGDAGVRANAQQPKLFVGMILILIFAEALALYGLI LeVHA-c4 ACGLAGLSAGMAIGIVGDAGVRANAQQPKLFVGMILILIFAEALALYGLI Consensus ACGLAGLSAGMAIGIVGDAGVRANAQQPKLFVGMILILIFAEALALYGLI 151 165 LeVHA-c1 VGIILSSRAGQSRAE LeVHA-c2 VGIILSSRAGQSRAE LeVHA-c3 VGIILSSRAGQSRAD LeVHA-c4 VGIILSSRAGQSRAE Consensus VGIILSSRAGQSRAE

Figure 1. Alignment of c subunits in tomato.

125

1 50 LeVHA-c1 (1) --MSNFAGDETAPFFGFLGAAAALVFSCMGAAYGTAKSGVGVASMGVMRP LeVHA-c3 (1) --MSNFAGDETAPFFGFLGAAAALVFSCMGAAYGTAKSGVGVASMGVMRP LeVHA-c4 (1) -MASTFSGDETAPFFGFLGAAAALVFSCMGAAYGTAKSGVGVASMGVMRP LeVHA-c2 (1) -MASTFSGDETAPFFGFLGAAAALVFSCMGAAYGTAKSGVGVASMGVMRP AtVHA-c1 (1) --MSTFSGDETAPFFGFLGAAAALVFSCMGAAYGTAKSGVGVASMGVMRP AtVHA-c2 (1) -MASTFSGDETAPFFGFLGAAAALVFSCMGAAYGTAKSGVGVASMGVMRP AtVHA-c3 (1) --MSTFSGDETAPFFGFLGAAAALVFSCMGAAYGTAKSGVGVASMGVMRP AtVHA-c4 (1) MASSGFSGDETAPFFGFLGAAAALVFSCMGAAYGTAKSGVGVASMGVMRP AtVHA-c5 (1) --MSTFSGDETAPFFGFLGAAAALVFSCMGAAYGTAKSGVGVASMGVMRP Consensus (1) MSTFSGDETAPFFGFLGAAAALVFSCMGAAYGTAKSGVGVASMGVMRP 51 100 LeVHA-c1 (49) ELVMKSIVPVVMAGVLGIYGLIIAVIISTGINPKTKSYYLFDGYAHLSSG LeVHA-c3 (49) ELVMKSIVPVVMAGVLGIYGLIIAVIISTGINPKTKSYYLFDGYAHLSSG LeVHA-c4 (50) ELVMKSIVPVVMAGVLGIYGLIIAVIISTGINPKTKSYYLFDGYAHLSSG LeVHA-c2 (50) ELVMKSIVPVVMAGVLGIYGLIIAVIISTGINPKTKSYYLFDGYAHLSSG AtVHA-c1 (49) ELVMKSIVPVVMAGVLGIYGLIIAVIISTGINPKAKSYYLFDGYAHLSSG AtVHA-c2 (50) ELVMKSIVPVVMAGVLGIYGLIIAVIISTGINPKAKSYYLFDGYAHLSSG AtVHA-c3 (49) ELVMKSIVPVVMAGVLGIYGLIIAVIISTGINPKAKSYYLFDGYAHLSSG AtVHA-c4 (51) ELVMKSIVPVVMAGVLGIYGLIIAVIISTGINPKAKSYYLFDGYAHLSSG AtVHA-c5 (49) ELVMKSIVPVVMAGVLGIYGLIIAVIISTGINPKAKSYYLFDGYAHLSSG Consensus (51) ELVMKSIVPVVMAGVLGIYGLIIAVIISTGINPKAKSYYLFDGYAHLSSG 101 150 LeVHA-c1 (99) LACGLAGLSAGMAIGIVGDAGVRANAQQPKLFVGMILILIFAEALALYGL LeVHA-c3 (99) LACGLAGLSAGMAIGIVGDAGVRANAQQPKLFVGMILILIFAEALALYGL LeVHA-c4 (100) LACGLAGLSAGMAIGIVGDAGVRANAQQPKLFVGMILILIFAEALALYGL LeVHA-c2 (100) LACGLAGLSAGMAIGIVGDAGVRANAQQPKLFVGMILILIFAEALALYGL AtVHA-c1 (99) LACGLAGLSAGMAIGIVGDAGVRANAQQPKLFVGMILILIFAEALALYGL AtVHA-c2 (100) LACGLAGLSAGMAIGIVGDAGVRANAQQPKLFVGMILILIFAEALALYGL AtVHA-c3 (99) LACGLAGLSAGMAIGIVGDAGVRANAQQPKLFVGMILILIFAEALALYGL AtVHA-c4 (101) LACGLAGLSAGMAIGIVGDAGVRANAQQPKLFVGMILILIFAEALALYGL AtVHA-c5 (99) LACGLAGLSAGMAIGIVGDAGVRANAQQPKLFVGMILILIFAEALALYGL Consensus (101) LACGLAGLSAGMAIGIVGDAGVRANAQQPKLFVGMILILIFAEALALYGL 151 166 LeVHA-c1 (149) IVGIILSSRAGQSRAE LeVHA-c3 (149) IVGIILSSRAGQSRAD LeVHA-c4 (150) IVGIILSSRAGQSRAE LeVHA-c2 (150) IVGIILSSRAGQSRAE AtVHA-c1 (149) IVGIILSSRAGQSRAE AtVHA-c2 (150) IVGIILSSRAGQSRAE AtVHA-c3 (149) IVGIILSSRAGQSRAE AtVHA-c4 (151) IVGIILSSRAGQSRAE AtVHA-c5 (149) IVGIILSSRAGQSRAE Consensus (151) IVGIILSSRAGQSRAE

Figure 2. Alignment of c subunits in tomato and Arabidopsis.

126

Table 1. Subunit c amino acid identities.

LeVHA-c1 LeVHA-c3 LeVHA-c4 LeVHA-c2 AtVHA-c1 AtVHA-c2 AtVHA-c3 AtVHA-c4 AtVHA-c5 LeVHA-c1 100 99 98 98 98 97 98 96 98 LeVHA-c3 100 97 97 98 96 98 96 98 LeVHA-c4 100 100 98 99 98 97 98 LeVHA-c2 100 98 99 98 97 98 AtVHA-c1 100 99 100 98 100 AtVHA-c2 100 99 98 99 AtVHA-c3 100 98 100 AtVHA-c4 100 98 AtVHA-c5 100

1 50 LeVHA-c''1 (1) MSAASTMAVMGASSSWSRALIQISPYTFSAVGIAIAIGVSVLGAAWGIYI LeVHA-c''2 (1) ------MAGPSSSWSRALVQISPYTFAAVGIAIAIGVSVLGAAWGIYI Consensus (1) M G SSSWSRALIQISPYTFAAVGIAIAIGVSVLGAAWGIYI 51 100 LeVHA-c''1 (51) TGSSLIGAAIKAPRITSKNLISVIFCEAVAIYGVIVAIILQTKLESVPAS LeVHA-c''2 (43) TGSSLIGAAIKAPRITSKNLISVIFCEAVAIYGVIVAIILQTKLESVPAS Consensus (51) TGSSLIGAAIKAPRITSKNLISVIFCEAVAIYGVIVAIILQTKLESVPAS 101 150 LeVHA-c''1 (101) KIYAAESLRAGYAIFASGIIVGFANLVCGLCVGIIGSSCALSDAQNSTLF LeVHA-c''2 (93) QIYAPESLRAGYAIFASGIIVGFANLVCGLCVGIIGSSCALSDAQNSSLF Consensus (101) IYA ESLRAGYAIFASGIIVGFANLVCGLCVGIIGSSCALSDAQNSSLF 151 185 LeVHA-c''1 (151) VKILVIEIFGSALGLFGVIVGIIMSAQATWPSKTA LeVHA-c''2 (143) VKILVIEIFGSALGLFGVIVGIIMSAQASWPSKGA Consensus (151) VKILVIEIFGSALGLFGVIVGIIMSAQASWPSK A

Figure 3. Alignment of c” subunits in tomato.

127

1 50 LeVHA-c''1 (1) MSAASTMAVMGASSSWSRALIQISPYTFSAVGIAIAIGVSVLGAAWGIYI LeVHA-c''2 (1) ------MAGPSSSWSRALVQISPYTFAAVGIAIAIGVSVLGAAWGIYI AtVHA-c''1 (1) ---MSGVVALGHASSWGAALVRISPYTFSAIGIAISIGVSVLGAAWGIYI AtVHA-c''2 (1) -----MSGVAIHASSWGAALVRISPYTFSAIGIAISIGVSVLGAAWGIYI Consensus (1) S MAVAGHASSWSRALVRISPYTFSAIGIAIAIGVSVLGAAWGIYI 51 100 LeVHA-c''1 (51) TGSSLIGAAIKAPRITSKNLISVIFCEAVAIYGVIVAIILQTKLESVPAS LeVHA-c''2 (43) TGSSLIGAAIKAPRITSKNLISVIFCEAVAIYGVIVAIILQTKLESVPAS AtVHA-c''1 (48) TGSSLIGAAIEAPRITSKNLISVIFCEAVAIYGVIVAIILQTKLESVPSS AtVHA-c''2 (46) TGSSLIGAAIEAPRITSKNLISVIFCEAVAIYGVIVAIILQTKLESVPSS Consensus (51) TGSSLIGAAIKAPRITSKNLISVIFCEAVAIYGVIVAIILQTKLESVPAS 101 150 LeVHA-c''1 (101) KIYAAESLRAGYAIFASGIIVGFANLVCGLCVGIIGSSCALSDAQNSTLF LeVHA-c''2 (93) QIYAPESLRAGYAIFASGIIVGFANLVCGLCVGIIGSSCALSDAQNSSLF AtVHA-c''1 (98) KMYDAESLRAGYAIFASGIIVGFANLVCGLCVGIIGSSCALSDAQNSTLF AtVHA-c''2 (96) KMYDAESLRAGYAIFASGIIVGFANLVCGLCVGIIGSSCALSDAQNSTLF Consensus (101) KIYDAESLRAGYAIFASGIIVGFANLVCGLCVGIIGSSCALSDAQNSTLF 151 185 LeVHA-c''1 (151) VKILVIEIFGSALGLFGVIVGIIMSAQATWPSKTA LeVHA-c''2 (143) VKILVIEIFGSALGLFGVIVGIIMSAQASWPSKGA AtVHA-c''1 (148) VKILVIEIFGSALGLFGVIVGIIMSAQATWPTK-- AtVHA-c''2 (146) VKILVIEIFGSALGLFGVIVGIIMSAQATWPTK-- Consensus (151) VKILVIEIFGSALGLFGVIVGIIMSAQATWPSK A

Figure 4. Alignment of c” subunits in tomato and Arabidopsis.

Table 2. Subunit c” amino acid identities.

LeVHA-c''1 LeVHA-c''2 AtVHA-c''1 AtVHA-c''2 LeVHA-c''1 100 90 87 86 LeVHA-c''2 100 86 87 AtVHA-c''1 100 96 AtVHA-c''2 100

128

1 50 AtVHA-d1 (1) MYGFEALTFNIHGGYLEAIVRGHRAGLLTTADYNNLCQCENLDDIKMHLS AtVHA-d2 (1) MYGFEALTFNIHGGYLEAIVRGHRAGLLTTADYNNLCQCENLDDIKMHLS LeVHA-d1 (1) MYGFEALTFNIHSGYLEAIVRGHRSGLLTAADYNNLCQCETLDDIKMHLS Consensus (1) MYGFEALTFNIHGGYLEAIVRGHRAGLLTTADYNNLCQCENLDDIKMHLS 51 100 AtVHA-d1 (51) ATKYGSYLQNEPSPLHTTTIVEKCTLKLVDDYKHMLCQATEPMSTFLEYI AtVHA-d2 (51) ATKYGPYLQNEPSPLHTTTIVEKCTLKLVDDYKHMLCQATEPMSTFLEYI LeVHA-d1 (51) ATEYGPYLQNEPSPLHTTTIVEKCTVKLVDEFNHMLCQATEPLSTFLEYI Consensus (51) ATKYGPYLQNEPSPLHTTTIVEKCTLKLVDDYKHMLCQATEPMSTFLEYI 101 150 AtVHA-d1 (101) RYGHMIDNVVLIVTGTLHERDVQELIEKCHPLGMFDSIATLAVAQNMREL AtVHA-d2 (101) RYGHMIDNVVLIVTGTLHERDVQELIEKCHPLGMFDSIATLAVAQNMREL LeVHA-d1 (101) RYGHMIDNVVLIVTGTLHERDVQELLEKCHPLGMFDSIASLAVAQNMREL Consensus (101) RYGHMIDNVVLIVTGTLHERDVQELIEKCHPLGMFDSIATLAVAQNMREL 151 200 AtVHA-d1 (151) YRLVLVDTPLAPYFSECLTSEDLDDMNIEIMRNTLYKAYLEDFYKFCQKL AtVHA-d2 (151) YRLVLVDTPLAPYFSECLTSEDLDDMNIEIMRNTLYKAYLEDFYNFCQKL LeVHA-d1 (151) YRLVLVDTPLAPYFSECITSEDLDDMNIEIMRNTLYKAYLEDFYRFCQKL Consensus (151) YRLVLVDTPLAPYFSECLTSEDLDDMNIEIMRNTLYKAYLEDFYKFCQKL 201 250 AtVHA-d1 (201) GGATAEIMSDLLAFEADRRAVNITINSIGTELTREDRKKLYSNFGLLYPY AtVHA-d2 (201) GGATAEIMSDLLAFEADRRAVNITINSIGTELTREDRKKLYSNFGLLYPY LeVHA-d1 (201) GGATAEIMSDLLSFEADRRAVNITINSIGTELTRDDRRKLYSNFGLLYPY Consensus (201) GGATAEIMSDLLAFEADRRAVNITINSIGTELTREDRKKLYSNFGLLYPY 251 300 AtVHA-d1 (251) GHEELAICEDIDQVRGVMEKYPPYQAIFSKMSYGESQMLDKAFYEEEVRR AtVHA-d2 (251) GHEELAICEDIDQVRGVMEKYPPYQAIFSKMSYGESQMLDKAFYEEEVRR LeVHA-d1 (251) GHEELAICEDIDQVRGVMEKYPPYQSIFSKLSYGESQMLDKAFYEEEVKR Consensus (251) GHEELAICEDIDQVRGVMEKYPPYQAIFSKMSYGESQMLDKAFYEEEVRR 301 350 AtVHA-d1 (301) LCLAFEQQFHYAVFFAYMRLREQEIRNLMWISECVAQNQKSRIHDSVVYM AtVHA-d2 (301) LCLAFEQQFHYAVFFAYMRLREQEIRNLMWISECVAQNQKSRIHDSVVYM LeVHA-d1 (301) LCLSFEQQFHYGVFFSYIRLREQEIRNLMWISECVSQNQKTRVHDSVVFI Consensus (301) LCLAFEQQFHYAVFFAYMRLREQEIRNLMWISECVAQNQKSRIHDSVVYM 351 AtVHA-d1 (351) F AtVHA-d2 (351) F LeVHA-d1 (351) F Consensus (351) F

Figure 5. Alignment of d subunits in tomato and Arabidopsis.

Table 3. Subunit d amino acid identities.

AtVHA-d1 AtVHA-d2 LeVHA-d1 AtVHA-d1 100 99 91 AtVHA-d2 100 92 LeVHA-d1 100

129

1 50 LeVHA-e1 (1) MGFLVTTLIFVAIGVIASLCARICCNRGPSTNLLHLTLIITATVCCWMMW LeVHA-e3 (1) MGFAVTSLIFVVVGVIASFGAGICCNRGPSTNLLHLTLIITATVCCWMMW Consensus (1) MGF VTSLIFV IGVIAS A ICCNRGPSTNLLHLTLIITATVCCWMMW 51 71 LeVHA-e1 (51) AIVYLAQLKP-LIVPVLSEGE LeVHA-e3 (51) AIVYLAQLKPPLIVPILSEGE Consensus (51) AIVYLAQLKP LIVPILSEGE

Figure 6. Alignment of e subunits in tomato.

1 50 LeVHA-e1 (1) MGFLVTTLIFVAIGVIASLCARICCNRGPSTNLLHLTLIITATVCCWMMW LeVHA-e3 (1) MGFAVTSLIFVVVGVIASFGAGICCNRGPSTNLLHLTLIITATVCCWMMW AtVHA-e1 (1) MGFLITTLIFVVVGIIASLCVRICCNRGPSTNLLHLTLVITATVCCWMMW AtVHA-e2 (1) MAFVVTSLIFAVVGIIASICTRICFNKGPSTNLLHLTLVITATVCCWMMW Consensus (1) MGFLVTSLIFVVVGIIASLCARICCNRGPSTNLLHLTLIITATVCCWMMW 51 71 LeVHA-e1 (51) AIVYLAQLKP-LIVPVLSEGE LeVHA-e3 (51) AIVYLAQLKPPLIVPILSEGE AtVHA-e1 (51) AIVYIAQMNP-LIVPILSETE AtVHA-e2 (51) AIVYIAQMNP-LIVPILSEVE Consensus (51) AIVYIAQLNP LIVPILSEGE

Figure 7. Alignment of e subunits in tomato and Arabidopsis.

Table 4. Subunit e amino acid identities.

LeVHA-e1 LeVHA-e3 AtVHA-e1 AtVHA-e2 LeVHA-e1 100 87 84 76 LeVHA-e3 100 80 77 AtVHA-e1 100 86 AtVHA-e2 100

130

1 50 LeVHA-A (1) MPSIVGGPMTTFEDSEKESEYGYVRKVSGPVVVADGMGGAAMYELVRVGH AtVHA-A (1) MPAFYGGKLTTFEDDEKESEYGYVRKVSGPVVVADGMAGAAMYELVRVGH Consensus (1) MPA GG LTTFED EKESEYGYVRKVSGPVVVADGMAGAAMYELVRVGH 51 100 LeVHA-A (51) DNLIGEIIRLEGDSATIQVYEETAGLMVNDPVLRTHKPLSVELGPGILGN AtVHA-A (51) DNLIGEIIRLEGDSATIQVYEETAGLTVNDPVLRTHKPLSVELGPGILGN Consensus (51) DNLIGEIIRLEGDSATIQVYEETAGL VNDPVLRTHKPLSVELGPGILGN 101 150 LeVHA-A (101) IFDGIQRPLKTIAKRSGDVYIPRGVSVPALDKDILWEFQPKKIGEGDLLT AtVHA-A (101) IFDGIQRPLKTIARISGDVYIPRGVSVPALDKDCLWEFQPNKFVEGDTIT Consensus (101) IFDGIQRPLKTIAK SGDVYIPRGVSVPALDKD LWEFQP K EGD IT 151 200 LeVHA-A (151) GGDLYATVFENSLMEHRVALPPDAMGKITYIAPAGQYSLNDTVLELEFQG AtVHA-A (151) GGDLYATVFENTLMNHLVALPPDAMGKITYIAPAGQYSLKDTVIELEFQG Consensus (151) GGDLYATVFENSLM H VALPPDAMGKITYIAPAGQYSL DTVIELEFQG 201 250 LeVHA-A (201) VKKQVTMLQTWPVRSPRPVASKLAADTPLLTGQRVLDALFPSVLGGTCAI AtVHA-A (201) IKKSYTMLQSWPVRTPRPVASKLAADTPLLTGQRVLDALFPSVLGGTCAI Consensus (201) IKK TMLQSWPVRSPRPVASKLAADTPLLTGQRVLDALFPSVLGGTCAI 251 300 LeVHA-A (251) PGAFGCGKTVISQALSKYSNSDTVVYVGCGERGNEMAEVLMDFPQLTMTL AtVHA-A (251) PGAFGCGKTVISQALSKYSNSDAVVYVGCGERGNEMAEVLMDFPQLTMTL Consensus (251) PGAFGCGKTVISQALSKYSNSD VVYVGCGERGNEMAEVLMDFPQLTMTL 301 350 LeVHA-A (301) PDGREESVMKRTTLVANTSNMPVAAREASIYTGITIAEYFIDMGYNVSMM AtVHA-A (301) PDGREESVMKRTTLVANTSNMPVAAREASIYTGITIAEYFRDMGYNVSMM Consensus (301) PDGREESVMKRTTLVANTSNMPVAAREASIYTGITIAEYF DMGYNVSMM 351 400 LeVHA-A (351) ADSTSRWAEALREISGRLAEMPADSGYPAYLAARLASFYERAGKVKCLGG AtVHA-A (351) ADSTSRWAEALREISGRLAEMPADSGYPAYLAARLASFYERAGKVKCLGG Consensus (351) ADSTSRWAEALREISGRLAEMPADSGYPAYLAARLASFYERAGKVKCLGG 401 450 LeVHA-A (401) PERTGSVTIVGAVSPPGGDFSDPVTSATLGIVQVFWGLDKKLAQRKHFPS AtVHA-A (401) PERNGSVTIVGAVSPPGGDFSDPVTSATLSIVQVFWGLDKKLAQRKHFPS Consensus (401) PER GSVTIVGAVSPPGGDFSDPVTSATL IVQVFWGLDKKLAQRKHFPS 451 500 LeVHA-A (451) VNWLISYSKYSGALESFYEKFDPDFINIRTKAREVLQREDDLNEIVQLVG AtVHA-A (451) VNWLISYSKYSTALESFYEKFDPDFINIRTKAREVLQREDDLNEIVQLVG Consensus (451) VNWLISYSKYS ALESFYEKFDPDFINIRTKAREVLQREDDLNEIVQLVG 501 550 LeVHA-A (501) KDALAETDKITLETAKLLREDYLAQNAFTPYDKFCPFYKSVWMLRNIIHF AtVHA-A (501) KDALAEGDKITLETAKLLREDYLAQNAFTPYDKFCPFYKSVWMMRNIIHF Consensus (501) KDALAE DKITLETAKLLREDYLAQNAFTPYDKFCPFYKSVWMLRNIIHF 551 600 LeVHA-A (551) YNLANQAVERGAGMDGQKITYTLIKHRLGDLFYRLVSQKFEDPAEGEDVL AtVHA-A (551) YNLANQAVERAAGMDGQKITYTLIKHRLGDLFYRLVSQKFEDPAEGEDTL Consensus (551) YNLANQAVERAAGMDGQKITYTLIKHRLGDLFYRLVSQKFEDPAEGED L 601 623 LeVHA-A (601) VGKFQKLHDDLVAGFRNLEDETR AtVHA-A (601) VEKFKKLYDDLNAGFRALEDETR Consensus (601) V KF KLHDDL AGFR LEDETR

Figure 8. Alignment of A subunits in tomato and Arabidopsis.

131

Table 5. Subunit A amino acid identities.

LeVHA-A AtVHA-A LeVHA-A 100 94 AtVHA-A 100

1 50 LeVHA-B1 (1) MGSAPNSIE-MEEGTLEVGMEYRTVSGVAGPLVILDKVKGPKYQEIVNIR LeVHA-B2 (1) MGKAKKNIENMEEGTLEVGMEYRTVSGVAGPLVILEKVKGPKYQEIVNIR Consensus (1) MG A IE MEEGTLEVGMEYRTVSGVAGPLVILDKVKGPKYQEIVNIR 51 100 LeVHA-B1 (50) LGDGTTRRGQVLEVDGEKAVVQVFEGTSGIDNKYTTVQFTGEVLKTPVSL LeVHA-B2 (51) LGDGTTRRGQVLEVDGEKAVVQVFEGTSGIDNKYTTVQFTGEVLKTPVSL Consensus (51) LGDGTTRRGQVLEVDGEKAVVQVFEGTSGIDNKYTTVQFTGEVLKTPVSL 101 150 LeVHA-B1 (100) DMLGRIFNGSGKPIDNGPPILPEAYRDISGSSINPSERTYPEEMIQTGIS LeVHA-B2 (101) DMLGRIFNGSGKPIDNGPPILPEAYRDISGSSINPSERTYPEEMIQTGIS Consensus (101) DMLGRIFNGSGKPIDNGPPILPEAYRDISGSSINPSERTYPEEMIQTGIS 151 200 LeVHA-B1 (150) TVDVMNSIARGQKIPLFSAAGLPHNEIAAQICRQAGLVKRLEKSDNLLEG LeVHA-B2 (151) TIDVMNSIARGQKIPLFSAAGLPHNEIAAQICRQAGLVKRLEKSENLLED Consensus (151) TIDVMNSIARGQKIPLFSAAGLPHNEIAAQICRQAGLVKRLEKSDNLLE 201 250 LeVHA-B1 (200) GEEDNFAIVFAAMGVNMETAQFFKRDFEENGSMERVTLFLNLANDPTIER LeVHA-B2 (201) SEADNFAIVFAAMGVNMETAQFFKRDFEENGSMERVTLFLNLANDPTIER Consensus (201) E DNFAIVFAAMGVNMETAQFFKRDFEENGSMERVTLFLNLANDPTIER 251 300 LeVHA-B1 (250) IITPRIALTTAEYLAYECGKHVLVILTDMSSYADALREVSAAREEVPGRR LeVHA-B2 (251) IITPRIALTTAEYLAYECGKHVLVILTDMSSYADALREVSAAREEVPGRR Consensus (251) IITPRIALTTAEYLAYECGKHVLVILTDMSSYADALREVSAAREEVPGRR 301 350 LeVHA-B1 (300) GYPGYMYTDLATIYERAGRIEGRTGSITQIPILTMPNDDITHPTPDLTGY LeVHA-B2 (301) GYPGYMYTDLATIYERAGRIEGRTGSITQIPILTMPNDDITHPTPDLTGY Consensus (301) GYPGYMYTDLATIYERAGRIEGRTGSITQIPILTMPNDDITHPTPDLTGY 351 400 LeVHA-B1 (350) ITEGQIYIDRQLHNRQIYPPINVLPSLSRLMKSAIGEGMTRRDHSDVSNQ LeVHA-B2 (351) ITEGQIYIDRQLHNRQIYPPINVLPSLSRLMKSAIGEGMTRRDHADVSNQ Consensus (351) ITEGQIYIDRQLHNRQIYPPINVLPSLSRLMKSAIGEGMTRRDHADVSNQ 401 450 LeVHA-B1 (400) LYANYAIGKDVQAMKAVVGEEALSSEDLLYLEFLDKFERKFVSQGAYDTR LeVHA-B2 (401) LYANYAIGKDVQAMKAVVGEEALSSEDLLYLEFLDKFERKFVSQGAYDTR Consensus (401) LYANYAIGKDVQAMKAVVGEEALSSEDLLYLEFLDKFERKFVSQGAYDTR 451 489 LeVHA-B1 (450) NIFQSLDLAWTLLRIFPRELLHRIPAKTLDQYYSRDASN LeVHA-B2 (451) NIFQSLDLAWTLLRIFPRELLHRIPAKTLDQYYSRDAPN Consensus (451) NIFQSLDLAWTLLRIFPRELLHRIPAKTLDQYYSRDA N

Figure 9. Alignment of B subunits in tomato.

132

1 50 LeVHA-B1 (1) MGSAPNSIE-MEEGTLEVGMEYRTVSGVAGPLVILDKVKGPKYQEIVNIR LeVHA-B2 (1) MGKAKKNIENMEEGTLEVGMEYRTVSGVAGPLVILEKVKGPKYQEIVNIR AtVHA-B2 (1) MGAAENNLE--MEGTLEIGMEYRTVSGVAGPLVILEKVKGPKYQEIVNIR AtVHA-B3 (1) --MVETSID-MEEGTLEIGMEYRTVSGVAGPLVILDKVKGPKYQEIVNIR Consensus (1) MGAAENSIE MEEGTLEIGMEYRTVSGVAGPLVILDKVKGPKYQEIVNIR 51 100 LeVHA-B1 (50) LGDGTTRRGQVLEVDGEKAVVQVFEGTSGIDNKYTTVQFTGEVLKTPVSL LeVHA-B2 (51) LGDGTTRRGQVLEVDGEKAVVQVFEGTSGIDNKYTTVQFTGEVLKTPVSL AtVHA-B2 (49) LGDGTTRRGQVLEVDGEKAVVQVFEGTSGIDNKYTTVQFTGEVLKTPVSL AtVHA-B3 (48) LGDGSTRRGQVLEVDGEKAVVQVFEGTSGIDNKFTTVQFTGEVLKTPVSL Consensus (51) LGDGTTRRGQVLEVDGEKAVVQVFEGTSGIDNKYTTVQFTGEVLKTPVSL 101 150 LeVHA-B1 (100) DMLGRIFNGSGKPIDNGPPILPEAYRDISGSSINPSERTYPEEMIQTGIS LeVHA-B2 (101) DMLGRIFNGSGKPIDNGPPILPEAYRDISGSSINPSERTYPEEMIQTGIS AtVHA-B2 (99) DMLGRIFNGSGKPIDNGPPILPEAYLDISGSSINPSERTYPEEMIQTGIS AtVHA-B3 (98) DMLGRIFNGSGKPIDNGPPILPEAYLDISGSSINPSERTYPEEMIQTGIS Consensus (101) DMLGRIFNGSGKPIDNGPPILPEAYRDISGSSINPSERTYPEEMIQTGIS 151 200 LeVHA-B1 (150) TVDVMNSIARGQKIPLFSAAGLPHNEIAAQICRQAGLVKRLEKSDNLLEG LeVHA-B2 (151) TIDVMNSIARGQKIPLFSAAGLPHNEIAAQICRQAGLVKRLEKSENLLED AtVHA-B2 (149) TIDVMNSIARGQKIPLFSAAGLPHNEIAAQICRQAGLVKRLEKSDNLLEH AtVHA-B3 (148) TIDVMNSIARGQKIPLFSAAGLPHNEIAAQICRQAGLVKRLEKTENLIQE Consensus (151) TIDVMNSIARGQKIPLFSAAGLPHNEIAAQICRQAGLVKRLEKSDNLLED 201 250 LeVHA-B1 (200) GE-EDNFAIVFAAMGVNMETAQFFKRDFEENGSMERVTLFLNLANDPTIE LeVHA-B2 (201) SE-ADNFAIVFAAMGVNMETAQFFKRDFEENGSMERVTLFLNLANDPTIE AtVHA-B2 (199) QE-DDNFAIVFAAMGVNMETAQFFKRDFEENGSMERVTLFLNLANDPTIE AtVHA-B3 (198) DHGEDNFAIVFAAMGVNMETAQFFKRDFEENGSMERVTLFLNLANDPTIE Consensus (201) E EDNFAIVFAAMGVNMETAQFFKRDFEENGSMERVTLFLNLANDPTIE 251 300 LeVHA-B1 (249) RIITPRIALTTAEYLAYECGKHVLVILTDMSSYADALREVSAAREEVPGR LeVHA-B2 (250) RIITPRIALTTAEYLAYECGKHVLVILTDMSSYADALREVSAAREEVPGR AtVHA-B2 (248) RIITPRIALTTAEYLAYECGKHVLVILTDMSSYADALREVSAAREEVPGR AtVHA-B3 (248) RIITPRIALTTAEYLAYECGKHVLVILTDMSSYADALRFCCSRRSS--WK Consensus (251) RIITPRIALTTAEYLAYECGKHVLVILTDMSSYADALREVSAAREEVPGR 301 350 LeVHA-B1 (299) RGYPGYMYTDLATIYERAGRIEGRTGSITQIPILTMPNDDITHPTPDLTG LeVHA-B2 (300) RGYPGYMYTDLATIYERAGRIEGRTGSITQIPILTMPNDDITHPTPDLTG AtVHA-B2 (298) RGYPGYMYTDLATIYERAGRIEGRKGSITQIPILTMPNDDITHPTPDLTG AtVHA-B3 (296) TWISGVYYTDLATIYERAGRIEGRKGSITQIPILTMPNDDITHPTPDLTG Consensus (301) RGYPGYMYTDLATIYERAGRIEGRTGSITQIPILTMPNDDITHPTPDLTG 351 400 LeVHA-B1 (349) YITEGQIYIDRQLHNRQIYPPINVLPSLSRLMKSAIGEGMTRRDHSDVSN LeVHA-B2 (350) YITEGQIYIDRQLHNRQIYPPINVLPSLSRLMKSAIGEGMTRRDHADVSN AtVHA-B2 (348) YITEGQIYIDRQLHNRQIYPPINVLPSLSRLMKSAIGEGMTRRDHSDVSN AtVHA-B3 (346) YITEGQIYIDRQLHNRQIYPPINVLPSLSRLMKSAIGEGMTRKDHSDVSN Consensus (351) YITEGQIYIDRQLHNRQIYPPINVLPSLSRLMKSAIGEGMTRRDHSDVSN 401 450 LeVHA-B1 (399) QLYANYAIGKDVQAMKAVVGEEALSSEDLLYLEFLDKFERKFVSQGAYDT LeVHA-B2 (400) QLYANYAIGKDVQAMKAVVGEEALSSEDLLYLEFLDKFERKFVSQGAYDT AtVHA-B2 (398) QLYANYAIGKDVQAMKAVVGEEALSSEDLLYLEFLDKFERKFVAQGAYDT AtVHA-B3 (396) QLYANYAIGKDVQAMKAVVGEEALSSEDLLYLEFLDKFERKFVMQGAYDT Consensus (401) QLYANYAIGKDVQAMKAVVGEEALSSEDLLYLEFLDKFERKFVSQGAYDT 451 490

133 LeVHA-B1 (449) RNIFQSLDLAWTLLRIFPRELLHRIPAKTLDQYYSRDASN LeVHA-B2 (450) RNIFQSLDLAWTLLRIFPRELLHRIPAKTLDQYYSRDAPN AtVHA-B2 (448) RNIFQSLDLAWTLLRIFPRELLHRIPAKTLDQFYSRDTTN AtVHA-B3 (446) RNIFQSLDLAWTLLRIFPRELLHRIPAKTLDQFYSRDSTS Consensus (451) RNIFQSLDLAWTLLRIFPRELLHRIPAKTLDQFYSRDATN

Figure 10. Alignment of B subunits in tomato and Arabidopsis.

Table 6. Subunit B amino acid identities.

LeVHA-B1 LeVHA-B2 AtVHA-B1 AtVHA-B2 AtVHA-B3 LeVHA-B1 100 97 19 97 93 LeVHA-B2 100 19 96 92 AtVHA-B1 100 20 18 AtVHA-B2 100 93 AtVHA-B3 100

1 50 LeVHA-C (1) MASRYWVVSLPVQQNSSTTSLWSRLQESISRHSFDTPLYRFNIPNLRVGT AtVHA-C (1) MTSRYWVVSLPVKD--SASSLWNRLQEQISKHSFDTPVYRFNIPNLRVGT Consensus (1) M SRYWVVSLPV S SSLW RLQE ISKHSFDTPLYRFNIPNLRVGT 51 100 LeVHA-C (51) LDSLLALSDDLIKSNSFIEGVCSKTRRQIEELERVSGVLSSSLTVDGVPV AtVHA-C (49) LDSLLALGDDLLKSNSFVEGVSQKIRRQIEELERISGVESNALTVDGVPV Consensus (51) LDSLLAL DDLIKSNSFIEGV K RRQIEELERISGV S ALTVDGVPV 101 150 LeVHA-C (101) DSYLTRFAWDEAKYPTMSPLKEIVDGIHSQVAKIEDDLKVRVSEYNNVRS AtVHA-C (99) DSYLTRFVWDEAKYPTMSPLKEVVDNIQSQVAKIEDDLKVRVAEYNNIRG Consensus (101) DSYLTRF WDEAKYPTMSPLKEIVD I SQVAKIEDDLKVRVAEYNNIR 151 200 LeVHA-C (151) QLNAINRKQTGSLAVRDLSNLVKPADVVTSEHLTTLLAVVSKYSQKDWLS AtVHA-C (149) QLNAINRKQSGSLAVRDLSNLVKPEDIVESEHLVTLLAVVPKYSQKDWLA Consensus (151) QLNAINRKQSGSLAVRDLSNLVKP DIV SEHL TLLAVV KYSQKDWLA 201 250 LeVHA-C (201) SYETLTTYVVPRSSKMLYEDNEYALYTVTLFNRDADNFKNKARERGFQIR AtVHA-C (199) CYETLTDYVVPRSSKKLFEDNEYALYTVTLFTRVADNFRIAAREKGFQVR Consensus (201) YETLT YVVPRSSK LFEDNEYALYTVTLF R ADNFK AREKGFQIR 251 300 LeVHA-C (251) DFEHNPETQESRKQELEKLMQDQETFRSSLLQWCYTSYGEVFSSWMHFCA AtVHA-C (249) DFEQSVEAQETRKQELAKLVQDQESLRSSLLQWCYTSYGEVFSSWMHFCA Consensus (251) DFE E QESRKQEL KLMQDQES RSSLLQWCYTSYGEVFSSWMHFCA 301 350 LeVHA-C (301) VRIFAESILRYGLPPSFLSVVLAPSIKSEKKVRSILESLCDSSNSNFWKA AtVHA-C (299) VRTFAESIMRYGLPPAFLACVLSPAVKSEKKVRSILERLCDSTNSLYWKS Consensus (301) VR FAESILRYGLPPAFLA VLAPAIKSEKKVRSILE LCDSSNS FWKA 351 377 LeVHA-C (351) D-DEGGMAGFGGDTEAHPYVSFTINLV AtVHA-C (349) EEDAGAMAGLAGDSETHPYVSFTINLA Consensus (351) D D GAMAG AGDSE HPYVSFTINL

Figure 11. Alignment of C subunits in tomato and Arabidopsis.

134

Table 7. Subunit C amino acid identities.

LeVHA-C AtVHA-C LeVHA-C 100 80 AtVHA-C 100

1 50 LeVHA-D (1) MSGQTNRLVVVPTVTMLGVIKARLVGATRGHALLKKKSDALTVQFRQILK AtVHA-D (1) MAGQNARLNVVPTVTMLGVMKARLVGATRGHALLKKKSDALTVQFRALLK Consensus (1) MAGQ RL VVPTVTMLGVIKARLVGATRGHALLKKKSDALTVQFR ILK 51 100 LeVHA-D (51) KIVSTKESMGDVMKNSSFALTEAKYAAGENIKHVVLENVQTATLKVRSRQ AtVHA-D (51) KIVTAKESMGDMMKTSSFALTEVKYVAGDNVKHVVLENVKEATLKVRSRT Consensus (51) KIVS KESMGDMMK SSFALTE KY AGDNIKHVVLENV ATLKVRSR 101 150 LeVHA-D (101) ENIAGVKLPKFEHFSEGETKNDLTGLARGGQQVQACRAAYVKSIELLVEL AtVHA-D (101) ENIAGVKLPKFDHFSEGETKNDLTGLARGGQQVRACRVAYVKAIEVLVEL Consensus (101) ENIAGVKLPKFDHFSEGETKNDLTGLARGGQQV ACR AYVKAIELLVEL 151 200 LeVHA-D (151) ASLQTSFLTLDEAIKTTNRRVNALENVVKPRLENTVLYIKGELDELERED AtVHA-D (151) ASLQTSFLTLDEAIKTTNRRVNALENVVKPKLENTISYIKGELDELERED Consensus (151) ASLQTSFLTLDEAIKTTNRRVNALENVVKPKLENTI YIKGELDELERED 201 250 LeVHA-D (201) FFRLKKIQGYKKREVEKQMAAARLYAAEKSAEEFSLKRGISLGSAHNLLS AtVHA-D (201) FFRLKKIQGYKRREVERQAANAKEFAEEMVLEDISMQRGISINAARNFLV Consensus (201) FFRLKKIQGYKKREVEKQ A AK FA E ED SL RGISI AA N L 251 261 LeVHA-D (251) HASQKDDDIIF AtVHA-D (251) GGAEKDSDIIF Consensus (251) AA KD DIIF

Figure 12. Alignment of D subunits in tomato and Arabidopsis.

Table 8. Subunit D amino acid identities.

LeVHA-D AtVHA-D LeVHA-D 100 80 AtVHA-D 100

135

1 50 LeVHA-E1 (1) MNDADVSKQIQQMVRFIRQEAEEKANEISVSAEEEFNIEKLQLVEAEKKK LeVHA-E2 (1) MNDADVSKQIQQMVRFIRQEAEEKANEISVSAEEEFNIEKLQLVEAEKKK Consensus (1) MNDADVSKQIQQMVRFIRQEAEEKANEISVSAEEEFNIEKLQLVEAEKKK 51 100 LeVHA-E1 (51) IRQEYERKEKQVDVRKKIEYSMQLNASRIKVLQAQDDLVNTMKEAAAKEL LeVHA-E2 (51) IRQEYERKEKQVDVRKKIEYSMQLNASRIKVLQAQDDLVCSMKEAASKEL Consensus (51) IRQEYERKEKQVDVRKKIEYSMQLNASRIKVLQAQDDLV SMKEAAAKEL 101 150 LeVHA-E1 (101) LNVSHHEHGIIDSILHHHHGGYKKLLHDLIVQSLLRLKEPCVLLRCRKHD LeVHA-E2 (101) LNVSHHHN------HHIYKKLLQALIVQSLLRLKEPSVLLRCREDD Consensus (101) LNVSHH H YKKLL LIVQSLLRLKEP VLLRCR D 151 200 LeVHA-E1 (151) VHLVEHVLEGVKEEYAEKASVHQPEIIVDEIHLPPAPSHHNMHGPSCSGG LeVHA-E2 (141) VPLVEDVLDAAKEEYAEKSQVHAPEVIVDQIYLPPAPSHHNAHGPSCSGG Consensus (151) V LVE VLDA KEEYAEKA VH PEIIVD IHLPPAPSHHN HGPSCSGG 201 241 LeVHA-E1 (201) VVLASRDGKIVCENTLDARLEVVFRKKLPEIRKCLFGQVAA LeVHA-E2 (191) VVLASRDGKIVCENTLDARLEVVFRKKLPEIRKCLFGQVAV Consensus (201) VVLASRDGKIVCENTLDARLEVVFRKKLPEIRKCLFGQVA

Figure 13. Alignment of E subunits in tomato.

1 50 LeVHA-E1 (1) MNDADVSKQIQQMVRFIRQEAEEKANEISVSAEEEFNIEKLQLVEAEKKK LeVHA-E2 (1) MNDADVSKQIQQMVRFIRQEAEEKANEISVSAEEEFNIEKLQLVEAEKKK AtVHA-E1 (1) MNDADVSKQIQQMVRFIRQEAEEKANEISVSAEEEFNIEKLQLVEAEKKK AtVHA-E2 (1) MNDADVSKQIQQMVRFIRQEAEEKANEISISAEEEFNIERLQLLESAKRK AtVHA-E3 (1) MNDADASIQIQQMVRFIRQEAEEKANEISISSEEEFNIEKLQLVEAEKKK Consensus (1) MNDADVSKQIQQMVRFIRQEAEEKANEISVSAEEEFNIEKLQLVEAEKKK 51 100 LeVHA-E1 (51) IRQEYERKEKQVDVRKKIEYSMQLNASRIKVLQAQDDLVNTMKEAAAKEL LeVHA-E2 (51) IRQEYERKEKQVDVRKKIEYSMQLNASRIKVLQAQDDLVCSMKEAASKEL AtVHA-E1 (51) IRQEYERKEKQVDVRKKIEYSMQLNASRIKVLQAQDDLVNTMKEAAAKEL AtVHA-E2 (51) LRQDYDRKLKQVDIRKRIDYSTQLNASRIKYLQAQDDVVTAMKDSAAKDL AtVHA-E3 (51) IRQEYEKKEKQVDVRKKIDYSMQLNASRIKVLQAQDDIVNAMKEEAAKQL Consensus (51) IRQEYERKEKQVDVRKKIEYSMQLNASRIKVLQAQDDLVNTMKEAAAKEL 101 150 LeVHA-E1 (101) LNVSHHEHGIIDSILHHHHGGYKKLLHDLIVQSLLRLKEPCVLLRCRKHD LeVHA-E2 (101) LNVSHHHN------HHIYKKLLQALIVQSLLRLKEPSVLLRCREDD AtVHA-E1 (101) LNVSHHEHGIIDSILHHHHGGYKKLLHDLIVQSLLRLKEPCVLLRCRKHD AtVHA-E2 (101) LRVSNDKN------NYKKLLKSLIIESLLRLKEPSVLLRCREMD AtVHA-E3 (101) LKVSQHGF------FNHHHHQYKHLLKDLIVQCLLRLKEPAVLLRCREED Consensus (101) LNVSHH HHH YKKLL DLIVQSLLRLKEPSVLLRCRE D 151 200 LeVHA-E1 (151) VHLVEHVLEGVKEEYAEKASVHQPEIIVDE---IHLPPAPSHHNMHGPSC LeVHA-E2 (141) VPLVEDVLDAAKEEYAEKSQVHAPEVIVDQ---IYLPPAPSHHNAHGPSC AtVHA-E1 (151) VHLVEHVLEGVKEEYAEKASVHQPEIIVDE---IHLPPAPSHHNMHGPSC AtVHA-E2 (139) KKVVESVIEDAKRQYAEKAKVGSPKITIDEKVFLPPPPNPKLPDSHDPHC AtVHA-E3 (145) LDIVESMLDDASEEYCKKAKVHAPEIIVDKD--IFLPPAPSDDDPHALSC Consensus (151) V LVE VLEGAKEEYAEKA VHAPEIIVDE IHLPPAPSHHN HGPSC 136 201 250 LeVHA-E1 (198) SGGVVLASRDGKIVCENTLDARLEVVFRKKLPEIRKCLFGQVAA------LeVHA-E2 (188) SGGVVLASRDGKIVCENTLDARLEVVFRKKLPEIRKCLFGQVAV------AtVHA-E1 (198) SGGVVLASRDGKIVCENTLDARLEVVFRKKLPEIRKCLFGQVAA------AtVHA-E2 (189) SGGVVLASQDGKIVCENTLDARLDVAFRQKLPQIRTRLVGAPETSRA--- AtVHA-E3 (193) AGGVVLASRDGKIVCENTLDARLEVAFRNKLPEFCSKGSFLEMCVDPKVA Consensus (201) SGGVVLASRDGKIVCENTLDARLEVVFRKKLPEIRKCLFGQVA 251 300 LeVHA-E1 (242) ------LeVHA-E2 (232) ------AtVHA-E1 (242) ------AtVHA-E2 (236) ------AtVHA-E3 (243) LRQGWCSLMSDSNFITKEKLRDAKSMNPTGRRCPDPNGVEKKSMCYSSCK Consensus (251) 301 323 LeVHA-E1 (242) ------LeVHA-E2 (232) ------AtVHA-E1 (242) ------AtVHA-E2 (236) ------AtVHA-E3 (293) TQGFMGGSCQGHKGNYMCECYEG Consensus (301)

Figure 14. Alignment of E subunits in tomato and Arabidopsis.

Table 9. Subunit E amino acid identities.

LeVHA-E1 LeVHA-E2 AtVHA-E1 AtVHA-E2 AtVHA-E3 LeVHA-E1 100 89 100 72 55 LeVHA-E2 100 89 75 56 AtVHA-E1 100 72 55 AtVHA-E2 100 48 AtVHA-E3 100

1 50 LeVHA-F (1) MANRAPVRTNNSALIAMIADEDTITGFLLAGVGNVDLRRKTNYLIVDSKT AtVHA-F (1) MAGRATIPARNSALIAMIADEDTVVGFLMAGVGNVDIRRKTNYLIVDSKT Consensus (1) MA RA I NSALIAMIADEDTI GFLLAGVGNVDIRRKTNYLIVDSKT 51 100 LeVHA-F (51) TVKQIEDAFKEFTTREDIAIVLISQYVANMIRFLVDSYNKPIPAILEIPS AtVHA-F (51) TVRQIEDAFKEFSARDDIAIILLSQYIANMIRFLVDSYNKPVPAILEIPS Consensus (51) TVKQIEDAFKEFS RDDIAIILISQYIANMIRFLVDSYNKPIPAILEIPS 101 130 LeVHA-F (101) KDHPYDPAHDSVLSRVKYLFSTESVAGDRR AtVHA-F (101) KDHPYDPAHDSVLSRVKYLFSAESVSQR-- Consensus (101) KDHPYDPAHDSVLSRVKYLFS ESVA

Figure 15. Alignment of F subunits in tomato and Arabidopsis.

137

Table 10. Subunit F amino acid identities.

LeVHA-F AtVHA-F LeVHA-F 100 82 AtVHA-F 100

1 50 LeVHA-G1 (1) MESSRGGQNGIQLLLAAEQEAQRIVNVARTAKQARLKQAKEEAEKEIAEF LeVHA-G2 (1) MESNRGNQNGIQQLLGAEQEAQHIVNAARSAKQARLKQAKDEAEKEIAEF Consensus (1) MES RG QNGIQ LLAAEQEAQ IVN ARSAKQARLKQAKDEAEKEIAEF 51 100 LeVHA-G1 (51) RAYMEAEFQRKLEQTSGDSGANVKRLEIETNEKIEHLKTEASRVSADVVQ LeVHA-G2 (51) RAFMEAEFQRKLEQTSGDSGANVKRLDQETFAKIQHLKAESESISNDVVQ Consensus (51) RAFMEAEFQRKLEQTSGDSGANVKRLD ET KI HLK EA IS DVVQ 101 111 LeVHA-G1 (101) MLLRHVTTVKN LeVHA-G2 (101) MLLRQVTTVKN Consensus (101) MLLR VTTVKN

Figure 16. Alignment of G subunits in tomato.

1 50 LeVHA-G1 (1) MESSRGGQNGIQLLLAAEQEAQRIVNVARTAKQARLKQAKEEAEKEIAEF LeVHA-G2 (1) MESNRGNQNGIQQLLGAEQEAQHIVNAARSAKQARLKQAKDEAEKEIAEF AtVHA-G1 (1) MESNRG-QGSIQQLLAAEVEAQHIVNAARTAKMARLKQAKEEAEKEIAEY AtVHA-G2 (1) MES-----AGIQQLLAAEREAQQIVNAARTAKMTRLKQAKEEAETEVAEH AtVHA-G3 (1) MDSLRG-QGGIQMLLTAEQEAGRIVSAARTAKLARMKQAKDEAEKEMEEY Consensus (1) MES RG QGGIQQLLAAEQEAQ IVNAARTAKMARLKQAKEEAEKEIAEF 51 100 LeVHA-G1 (51) RAYMEAEFQRKLEQTSGDSGANVKRLEIETNEKIEHLKTEASRVSADVVQ LeVHA-G2 (51) RAFMEAEFQRKLEQTSGDSGANVKRLDQETFAKIQHLKAESESISNDVVQ AtVHA-G1 (50) KAQTEQDFQRKLEETSGDSGANVKRLEQETDTKIEQLKNEASRISKDVVE AtVHA-G2 (46) KTSTEQGFQRKLEATSGDSGANVKRLEQETDAKIEQLKNEATRISKDVVD AtVHA-G3 (50) RSRLEEEYQTQVSGT--DQEADAKRLDDETDVRITNLKESSSKVSKDIVK Consensus (51) RA ME EFQRKLE TSGDSGANVKRLEQETD KIEQLK EASRISKDVV 101 111 LeVHA-G1 (101) MLLRHVTTVKN LeVHA-G2 (101) MLLRQVTTVKN AtVHA-G1 (100) MLLKHVTTVKN AtVHA-G2 (96) MLLKNVTTVNN AtVHA-G3 (98) MLIKYVTTTAA Consensus (101) MLLKHVTTVKN

Figure 17. Alignment of G subunits in tomato and Arabidopsis.

138

Table 11. Subunit G amino acid identities.

LeVHA-G1 LeVHA-G2 AtVHA-G1 AtVHA-G2 AtVHA-G3 LeVHA-G1 100 81 77 69 54 LeVHA-G2 100 75 68 54 AtVHA-G1 100 81 55 AtVHA-G2 100 49 AtVHA-G3 100

1 50 LeVHA-H (1) MTTESVELTTEEVLRRDIPWETYMTTKLITGTGLQLLRRYDKKAESYKAQ AtVHA-H (1) --MDQAELSIEQVLKRDIPWETYMNTKLVSAKGLQLLRRYDKKPESARAQ Consensus (1) D ELS E VLKRDIPWETYM TKLISA GLQLLRRYDKK ES KAQ 51 100 LeVHA-H (51) LLDDDGPGYVRVFVTILRDIFKEETVEYVLALIDEMLTANPKRARLFHDK AtVHA-H (49) LLDEDGPAYVHLFVSILRDIFKEETVEYVLALIYEMLSANPTRARLFHDE Consensus (51) LLDDDGPAYV LFVSILRDIFKEETVEYVLALI EMLSANP RARLFHD 101 150 LeVHA-H (101) SLADEDTYEPFLRLLWKGNWFIQEKSCKILSLTVSARSKVQNGADANGDA AtVHA-H (99) SLANEDTYEPFLRLLWKGNWFIQEKSCKILAWIISARPKAGNAVIGNG-- Consensus (101) SLA EDTYEPFLRLLWKGNWFIQEKSCKILA ISAR K NA ANG 151 200 LeVHA-H (151) SSSKKKITTIDDVLAGVVEWLCAQLRKPTHPTRSIASTINCLSTLLKEPV AtVHA-H (147) ------IDDVLKGLVEWLCAQLKQPSHPTRGVPIAISCLSSLLKEPV Consensus (151) IDDVL GLVEWLCAQLK PSHPTR I I CLSSLLKEPV 201 250 LeVHA-H (201) VRSSFVRADGVKLLVPLISPASTQQSIQLLYETCLCVWLLSYYEPAIEYL AtVHA-H (188) VRSSFVQADGVKLLVPLISPASTQQSIQLLYETCLCIWLLSYYEPAIEYL Consensus (201) VRSSFV ADGVKLLVPLISPASTQQSIQLLYETCLCIWLLSYYEPAIEYL 251 300 LeVHA-H (251) ATSRALTRLIEVVKGSTKEKVVRVVILTLRNLLSKGTFSAHMVDLGVLQI AtVHA-H (238) ATSRTMQRLTEVVKHSTKEKVVRVVILTFRNLLPKGTFGAQMVDLGLPHI Consensus (251) ATSR L RL EVVK STKEKVVRVVILT RNLL KGTF A MVDLGL I 301 350 LeVHA-H (301) VQSLKAQAWSDEDLLDALNQLEQGLKENIKKLSSFDKYKQEVLLGHLDWS AtVHA-H (288) IHSLKTQAWSDEDLLDALNQLEEGLKDKIKKLSSFDKYKQEVLLGHLDWN Consensus (301) I SLK QAWSDEDLLDALNQLE GLKD IKKLSSFDKYKQEVLLGHLDW 351 400 LeVHA-H (351) PMHKDPIFWRENINNFEENDFQILRVLITILDTSSDARTLAVACYDLSQF AtVHA-H (338) PMHKETNFWRENVTCFEENDFQILRVLLTILDTSSDPRSLAVACFDISQF Consensus (351) PMHKD FWRENI FEENDFQILRVLITILDTSSD RSLAVACFDISQF 401 450 LeVHA-H (401) IQCHSAGRIIVNDLKAKERVMRLLNHDNAEVTKNALLCIQRLFLGAKYAS AtVHA-H (388) IQYHAAGRVIVADLKAKERVMKLINHENAEVTKNAILCIQRLLLGAKYAS Consensus (401) IQ HAAGRIIV DLKAKERVMKLINHDNAEVTKNAILCIQRL LGAKYAS 451 LeVHA-H (451) FLQA AtVHA-H (438) FLQA Consensus (451) FLQA

Figure 18. Alignment of H subunits in tomato and Arabidopsis.

139

Table 12. Subunit H amino acid identities.

LeVHA-H AtVHA-H LeVHA-H 100 77 AtVHA-H 100

140 Appendix 2: Annotated sequences for novel tomato transcripts/proteins

LOCUS 404 bp mRNA linear PLN 08-MAR-2004 DEFINITION Acyl carrier protein. ACCESSION AY568716 KEYWORDS SOURCE tomato. ORGANISM Lycopersicon esculentum Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; ; lamiids; Solanales; Solanaceae; Solanum; Lycopersicon. REFERENCE 1 (bases 1 to 404) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Identification, accumulation, and functional prediction of novel tomato transcripts sytemically up-regulated after fire damage JOURNAL Unpublished REFERENCE 2 (bases 1 to 404) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Direct Submission JOURNAL Submitted (08-MAR-2004) Botany, North Carolina State University, Gardner Hall, Raleigh, NC Campus Box 7612, USA FEATURES Location/Qualifiers source 1..404 /organism="Lycopersicon esculentum" /db_xref="taxon:4081" CDS 1..402 /codon_start=1 /product="Acyl carrier protein" /translation="MASLSATCLRFGCSVNTSQINGGTVKLVSVGWGRSSAGFPSLRT SRLRVAAAKAETIDKVISIVRKQLALPADTKVSPESTFTKDLGADSLDTVEIVMALEE EFGIAVEEENSENIVTVQDAADLIEKLVEKK" transit_peptide 1..144 /note="Predicted chloroplast transit peptide" misc_feature 166..381 /note="Acyl carrier protein phosphopantetheine domain" misc_feature 268..270 /note="Phosphopantetheine attachment site (Serine-90)" BASE COUNT 120 a 75 c 104 g 105 t ORIGIN 1 atggctagtc tttcagctac ttgtctcaga tttggctgtt ctgtcaacac atctcagata 61 aacggaggca ctgtgaagtt ggtttcagtg ggttggggaa ggagtagtgc tggtttccct 121 tctctaagaa catcccgcct tcgtgttgca gctgcaaagg cagagacaat tgataaggta 181 ataagcatag tgagaaaaca actagcttta ccagcagaca ctaaggtcag ccctgaaagt 241 actttcacta aggacctcgg agccgactct ctggacactg tagaaattgt gatggcccta 301 gaagaagagt ttgggattgc agtagaagaa gagaactctg agaatattgt aacagttcaa 361 gatgctgctg acttgattga aaaacttgtt gagaagaagt agac //

141

LOCUS 1386 bp mRNA linear PLN 08-MAR-2004 DEFINITION Adenylyl-sulfate reductase. ACCESSION AY568717 KEYWORDS SOURCE tomato. ORGANISM Lycopersicon esculentum Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; asterids; lamiids; Solanales; Solanaceae; Solanum; Lycopersicon. REFERENCE 1 (bases 1 to 1386) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Identification, accumulation, and functional prediction of novel tomato transcripts systemically up-regulated after fire damage JOURNAL Unpublished REFERENCE 2 (bases 1 to 1386) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Direct Submission JOURNAL Submitted (08-MAR-2004) Botany, North Carolina State University, Gardner Hall, Raleigh, NC 27695, USA FEATURES Location/Qualifiers source 1..1386 /organism="Lycopersicon esculentum" /db_xref="taxon:4081" transit_peptide 1..213 /note="Predicted chloroplast transit peptide" CDS 1..1386 /codon_start=1 /product="Adenylyl-sulfate reductase" /translation="MALTFTSSSAIHGSLSSSSSSYEQPKVSQLGTFQPLDRPQLLSS TVLNSRRRSAVKPLYAEPKRNDSIVPSAATIVAPEVGESVEAEDFEKLAKELQNASPL EVMDKALEKFGDDIAIAFSGAEDVALIEYAHLTGRPYRVFSLDTGRLNPETYQLFDTV EKHYGIRIEYMFPDSVEVQALVRTKGLFSFYEDGHQECCRVRKVRPLRRALKGLRAWI TGQRKDQSPGTRSEIPIVQVDPSFEGLDGGAGSLVKWNPVANVDGKDIWNFLRAMNVP VNSLHSQGYVSIGCEPCTRPVLPGQHEREGRWWWEDAKAKECGLHKGNIKDETVNGAA QTNGTATVADIFDTKDIVTLSKPGVENLVKLEDRREPWLVVLYAPWCQFCQAMEGSYV ELAEKLAGSGVKVGKFRADGDQKAFAQEELQLGSFPTILFFPKHSSKAIKYPSEKRDV DSLLAFVNALR" misc_feature 19..63 /note="Serine-rich region" misc_feature 346..963 /note="Phosphoadenosine phosphosulfate reductase domain" misc_feature 1048..1377 /note="Thioredoxin domain 2" BASE COUNT 361 a 280 c 368 g 377 t ORIGIN 1 atggctttga ctttcacttc ttcatctgca attcatggct ctttgtcttc ttcatcttct 61 tcttatgaac aacccaaagt atcccaattg ggtacctttc agccattgga taggcctcaa 121 ctattgtcgt caactgtttt gaattctcgg aggcgttcgg cagtgaagcc attgtatgct 181 gaacctaaga ggaatgattc aatagttccg tcagcagcta ccatcgtggc tcctgaggta 241 ggagagagtg ttgaggcaga ggactttgag aaattggcta aggagcttca aaatgcttcc 301 cctcttgagg ttatggacaa agcacttgag aaatttggag atgacattgc tattgctttc 361 agtggtgctg aagatgttgc tttgatagag tacgcacatt taactggacg accatacaga 421 gtattcagcc ttgatactgg gaggttgaac ccggagacat accaattatt tgacacagtg 481 gagaagcact atggcattcg cattgaatac atgttccctg attcagttga agttcaggcg 541 ttggttagga ccaaagggct tttctctttc tatgaggatg gccaccaaga gtgttgccgt 601 gtaaggaagg ttaggccttt gaggagagct ctaaagggct tacgcgcctg gatcacaggc 661 cagcgtaaag atcagtcccc tggaactcga tcagaaatcc ccattgttca ggtggaccct 721 tcttttgagg ggttggatgg cggtgctggt agcttggtga agtggaaccc tgtggctaat 781 gtggacggaa aagatatttg gaacttcctg cgtgccatga atgtgcctgt gaactcattg 841 cattcacaag gatatgtatc cattggatgc gaaccttgca caaggccagt tctaccaggg 901 caacacgaga gagagggaag atggtggtgg gaagatgcca aggccaagga gtgtggcttg

142 961 cacaagggca acatcaagga tgaaactgta aatggcgctg cccaaacaaa tggtactgct 1021 accgttgctg atatttttga taccaaggac attgttacct tgagtaagcc tggagttgag 1081 aacctagtaa aattggaaga ccgaagagag ccttggctcg ttgttcttta tgcaccttgg 1141 tgccaatttt gccaggcaat ggaaggatcc tatgttgaat tggctgagaa gttggctggt 1201 tctggtgtga aagtagggaa attcagggca gatggtgacc agaaagcatt tgcacaagaa 1261 gaattgcagc ttggcagctt ccctacaata ctcttcttcc caaagcactc ttcaaaggcc 1321 attaagtacc cttcagagaa gagggacgta gactccttgc tggcttttgt gaatgctctc 1381 agatga //

143 LOCUS 717 bp mRNA linear PLN 08-MAR-2004 DEFINITION Unknown protein. ACCESSION AY568718 KEYWORDS SOURCE tomato. ORGANISM Lycopersicon esculentum Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; asterids; lamiids; Solanales; Solanaceae; Solanum; Lycopersicon. REFERENCE 1 (bases 1 to 717) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Identification, accumulation, and functional prediction of novel tomato transcripts systemically up-regulated after fire damage JOURNAL Unpublished REFERENCE 2 (bases 1 to 717) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Direct Submission JOURNAL Submitted (08-MAR-2004) Botany, North Carolina State University, Gardner Hall, Raleigh, 27695 27695, USA FEATURES Location/Qualifiers source 1..717 /organism="Lycopersicon esculentum" /db_xref="taxon:4081" transit_peptide 1..132 /note="Predicted chloroplast transit peptide" CDS 1..717 /codon_start=1 /product="Unknown protein" /translation="MACAALSANSCTIASSSTGRLSFSTYQKDSKLRQRHSLVRFRVR ASTDDSDCNAEECAPDKEVGKVSMEWVAMDNTKVVGTFPPRKPRGWTGYVEKDTAGQT NIYSVEPAVYVAESAISSGTAGTSSDGAENTKAISAGIALISVAAASSILLQVGKNSP PPIQTVEYRGPSLSYYINKLKPAEIVQASITEAPTAPETEEVAITPEVESSAPEAPAP QVEVQSEAPQDTSSSSSNIS" misc_feature 403..459 /note="Predicted transmembrane region" BASE COUNT 210 a 174 c 165 g 168 t ORIGIN 1 atggcttgtg ctgctttatc agcaaacagc tgcaccatag cttcatcgtc tactggacga 61 ttgagctttt ccacatacca aaaggactca aaattgaggc aaagacacag tctcgtccga 121 ttcagagttc gggcttcaac tgacgattct gattgcaatg ctgaagaatg tgccccagac 181 aaggaggttg ggaaggtgag catggaatgg gtagccatgg acaacaccaa agtggttggt 241 acatttccac ctcgtaagcc gcgtggctgg acagggtatg ttgagaagga tactgctggg 301 cagacaaata tatactctgt tgagcctgca gtttatgtag cagaaagtgc tataagctct 361 ggtactgcag gcacctcatc tgatggagca gagaacacca aagctatttc agctgggata 421 gccttaatct ctgttgcagc tgcttcatcg attctccttc aagttgggaa gaactcacct 481 cctccgatac aaacagtgga gtacagggga ccatccctta gctactatat caacaagctt 541 aagccagcgg aaatagtcca agcttcaata accgaagcac caactgcacc agaaaccgaa 601 gaagtagcaa ttacaccaga agttgaaagc tctgctccag aagctcctgc tccacaagtt 661 gaagtccaat ctgaagcccc tcaggacact tcaagttcaa gttctaacat ctcttag //

144 LOCUS 693 bp mRNA linear PLN 08-MAR-2004 DEFINITION Photosystem II oxygen-evolving complex protein 3 (PsbQ). ACCESSION AY568719 KEYWORDS SOURCE tomato. ORGANISM Lycopersicon esculentum Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; asterids; lamiids; Solanales; Solanaceae; Solanum; Lycopersicon. REFERENCE 1 (bases 1 to 693) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Identification, accumulation, and functional prediction of novel tomato transcripts systemically up-regulated after fire damage JOURNAL Unpublished REFERENCE 2 (bases 1 to 693) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Direct Submission JOURNAL Submitted (08-MAR-2004) Botany, North Carolina State University, Gardner Hall, Raleigh, NC 27695, USA FEATURES Location/Qualifiers source 1..693 /organism="Lycopersicon esculentum" /db_xref="taxon:4081" CDS 1..693 /codon_start=1 /product="Photosystem II oxygen-evolving complex protein 3 (PsbQ)" /translation="MAHAMASMGGLIGSSQTVLDGSLQLSGSARLSTVSTNRIALSRP GLTVRAQQGSVDIETSRRAMIGLVAAGLAGSVAKAAFAEARSIKVGPPPPPSGGLPGT LNSDEARDFSLPLKNRFYLQPLTPAEAAQRVKDSAKEIVSVKDFIDKKAWPYVQNDLR LRAEYLRYDLKTVISAMPKEQKGKLQDLSGKLFKTISDLDHAAKTKNSAEAQKYYAET VTTLNDVLANLG" transit_peptide 1..147 /note="Predicted chloroplast transit peptide" misc_feature 193..246 /note="Predicted transmembrane region" misc_feature 553..636 /note="Predicted coiled coil" BASE COUNT 178 a 165 c 171 g 179 t ORIGIN 1 atggctcatg ctatggcttc tatgggtggc ctaattggtt cttcacaaac tgtcttggat 61 ggtagcctcc agcttagtgg ctcagcccgc ttgagtactg ttagcaccaa cagaattgcc 121 ttgtctagac caggactcac tgtcagagcc caacaggggt ctgttgacat cgaaactagc 181 cgtagagcca tgattggtct tgttgctgct ggcctagctg gttccgttgc taaagcagct 241 tttgctgaag ccaggtcaat taaggttggc cccccacctc ctccctcggg tggattgcct 301 ggaactttga actcagatga ggcaagggac ttcagtttgc cattgaagaa taggttttac 361 cttcaaccgt tgactccagc tgaggcagcc cagagagtta aggattcagc caaggagatt 421 gttagtgtca aggatttcat cgacaagaag gcctggcctt acgtccagaa tgaccttcgt 481 ctcagagcag aataccttcg ctatgacctt aagactgtta tctctgctat gccaaaagaa 541 cagaagggaa aactccagga tctgtctgga aagctcttta agaccattag tgatctggac 601 catgcagcaa agaccaagaa cagtgctgaa gcacagaagt actatgctga aactgtaact 661 accttaaatg atgttttggc caacctgggc tag //

145 LOCUS 1224 bp mRNA linear PLN 08-MAR-2004 DEFINITION Putative anion:sodium symporter. ACCESSION AY568720 KEYWORDS SOURCE tomato. ORGANISM Lycopersicon esculentum Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; asterids; lamiids; Solanales; Solanaceae; Solanum; Lycopersicon. REFERENCE 1 (bases 1 to 1224) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Identification, accumulation, and functional prediction of novel tomato transcripts systemically up-regulated after fire damage JOURNAL Unpublished REFERENCE 2 (bases 1 to 1224) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Direct Submission JOURNAL Submitted (08-MAR-2004) Botany, North Carolina State University, Gardner Hall, Raleigh, NC 27695, USA FEATURES Location/Qualifiers source 1..1224 /organism="Lycopersicon esculentum" /db_xref="taxon:4081" transit_peptide 1..93 /note="Predicted mitochondria_chloroplast transit peptide" CDS 1..1224 /note="Involved in sulfate assimilation by which inorganic sulfate is processed and incorporated into sulfated compounds." /codon_start=1 /product="Putative anion:sodium symporter" /translation="MASLSRFIGKQCKLQCSDTLQRPSYGFCVRRSPTHLSMGMRNKD EIGRYNLFINQNQSKTSLVQSPCNRKIVCCEAASNVSGESSSTGMTQYEKIIETLTTL FPLWVILGTIIGIYKPSAVTWLETDLFTLGLGFLMLSMGLTLTFDDFRRCLRNPWTVG VGFLAQYFIKPLLGFTIAMALKLSAPLATGLILVSCCPGGQASNVATYISKGNVALSV LMTTCSTVGAIVMTPLLTKLLAGQLVPVDAAGLAISTFQVVLVPTVIGVLSNEFFPKF TSKIVTITPLIGVILTTLLCASPIGQVADVLKTQGAQLLLPVAALHAAAFFLGYQISK FSFGESTSRTISIECGMQSSALGFLLAQKHFTNPLVAVPSAVSVVCMALGGSALAVYW RNQPIPVDDKDDFKE" misc_feature 295..348 /note="Predicted transmembrane region" misc_feature 384..919 /note="Putative leucine zipper motif" misc_feature 385..438 /note="Predicted transmembrane region" misc_feature 475..528 /note="Predicted transmembrane region" misc_feature 547..600 /note="Predicted transmembrane region" misc_feature 637..705 /note="Predicted transmembrane region" misc_feature 742..801 /note="Predicted transmembrane region" misc_feature 838..903 /note="Predicted transmembrane region" misc_feature 940..993 /note="Predicted transmembrane region" misc_feature 1030..1083 /note="Predicted transmembrane region" misc_feature 1120..1173 /note="Predicted transmembrane region" BASE COUNT 314 a 252 c 268 g 390 t

146 ORIGIN 1 atggcttctc tgtccagatt tattgggaaa caatgtaaat tgcagtgttc agacacactt 61 cagagaccaa gttatgggtt ttgtgttaga aggagtccga cccatttgag tatgggtatg 121 agaaataaag atgagattgg aagatataat ttgttcatca atcaaaatca aagtaagact 181 tccctagttc aatccccgtg caatcgcaaa atagtatgtt gcgaggcagc atcaaatgtg 241 tctggggaaa gctcttccac tggaatgacc caatatgaga aaataattga gactttgacc 301 accctttttc ctctatgggt tatattgggt acaatcattg gcatatataa accttctgcg 361 gtcacttggt tggaaacaga tctcttcact ctgggtttgg gatttctaat gctttcaatg 421 ggtttgacac taacatttga cgacttccga agatgtttaa ggaacccatg gactgtaggt 481 gttggatttc tcgctcagta cttcattaaa ccactcttag gcttcaccat agcaatggct 541 ctaaagttgt ccgccccact tgctactggt ctgatcttgg tgtcatgctg tcctggaggc 601 caagcttcta atgtggcaac atatatttca aaggggaatg tagccctctc tgttctaatg 661 acaacgtgtt caacagttgg agctattgtg atgacacccc tgctgactaa gcttttagct 721 ggtcagcttg tcccagttga tgctgccggt cttgctatca gcacctttca agttgtgcta 781 gtgccaacag ttattggagt tctatcaaat gagttttttc ctaagtttac gtcaaaaatc 841 gtcaccatca cacctttaat tggagttatt ctgactactc ttctttgtgc tagtccgatt 901 ggtcaagtcg cagatgtgct gaaaactcag ggagcacagt tacttctccc tgtggcggcc 961 ttgcatgctg cagcattttt tctgggttac cagatttcaa aattttcatt tggtgaatca 1021 acatccagaa ctatttcgat agaatgtgga atgcagagtt cggcactcgg atttctactt 1081 gcacaaaagc atttcacaaa ccctcttgtt gctgtacctt ctgctgttag tgttgtctgc 1141 atggcacttg gtggaagtgc tctagctgtg tactggagga atcaaccaat tcctgttgat 1201 gacaaggatg attttaagga gtaa //

147 LOCUS 555 bp mRNA linear PLN 08-MAR-2004 DEFINITION Unknown wound/stress protein. ACCESSION AY568721 KEYWORDS SOURCE tomato. ORGANISM Lycopersicon esculentum Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; asterids; lamiids; Solanales; Solanaceae; Solanum; Lycopersicon. REFERENCE 1 (bases 1 to 555) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Identification, accumulation, and functional prediction of novel tomato transcripts systemically up-regulated after fire damage JOURNAL Unpublished REFERENCE 2 (bases 1 to 555) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Direct Submission JOURNAL Submitted (08-MAR-2004) Botany, North Carolina State University, Gardner Hall, Campus Box 7612, Raleigh, NC 27695, USA FEATURES Location/Qualifiers source 1..555 /organism="Lycopersicon esculentum" /db_xref="taxon:4081" transit_peptide 1..84 /note="Predicted secretory pathway transit peptide (i.e. golgi or ER)" CDS 1..555 /codon_start=1 /product="Unknown wound/stress protein" /translation="MGVAAQVNQMWFNLMIVLFFVSISSISAEDCVYTAYIRTGSIIK AGTDSNISLTLYDANGYGLRIKNIEAWGGLMGPGYNYFERGNLDIFSGKGPCVNGPIC KMNLTSDGTGPHHGWYCNYVEVTVTGAKKQCNQQLFTVNQWLGTDVSPYKLTAIRNNC KNKYESGELKPLYDSESFSIVDVI" misc_feature 91..465 /note="Lipoxygenase homology domain" BASE COUNT 158 a 110 c 124 g 163 t ORIGIN 1 atgggagtag cagctcaagt taaccaaatg tggttcaatc tcatgatcgt cctcttcttc 61 gtctctattt cttctatttc tgctgaagat tgtgtttaca cagcttacat tcgcactggt 121 tcaatcataa aagctggtac cgattcaaac atttcgttga ctctctacga tgccaatggc 181 tatggacttc gaataaaaaa catagaggcc tggggtggac ttatgggtcc aggttacaac 241 tactttgaaa gaggaaactt ggatatcttc agtgggaaag gtccttgtgt gaatggaccg 301 atctgtaaaa tgaatttgac ttcagatggt actggaccac accatggatg gtactgtaac 361 tacgtggaag tcaccgttac cggagctaaa aaacaatgca accagcagtt gttcaccgtg 421 aatcagtggc tgggcactga tgtttcgccg tataagctaa cggccatcag gaataactgt 481 aagaacaagt atgagtccgg tgagctaaag cccctttatg attctgaatc attttctata 541 gttgatgtaa tttaa //

148 LOCUS 936 bp mRNA linear PLN 08-MAR-2004 DEFINITION Chloroplast-specific ribosomal protein. ACCESSION AY568722 KEYWORDS SOURCE tomato. ORGANISM Lycopersicon esculentum Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; asterids; lamiids; Solanales; Solanaceae; Solanum; Lycopersicon. REFERENCE 1 (bases 1 to 936) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Identification, accumulation, and functional prediction of novel tomato transcripts systemically up-regulated after fire damage JOURNAL Unpublished REFERENCE 2 (bases 1 to 936) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Direct Submission JOURNAL Submitted (08-MAR-2004) Botany, North Carolina State University, Gardner Hall, Campus Box 7612, Raleigh, NC 27695, USA FEATURES Location/Qualifiers source 1..936 /organism="Lycopersicon esculentum" /db_xref="taxon:4081" transit_peptide 1..225 /note="Predicted chloroplast signal peptide" CDS 1..936 /codon_start=1 /product="Chloroplast-specific ribosomal protein" /translation="MATLSLSPSVGTTFHSLHSYPNGSSSYSSSCPATASPALSLTLS STNSRFLNSAFKMNEINVPVRNRVTKSFGVRMSWDGPLSSVKLILQGKNLELTPAVKD YVEEKLGKAVQKHSHLAREVDVRLSVRGGELGKGPKIRRCEVTLFTKKHGVIRAEEDG ESIYGSIDMVSSIIQRKLRKIKEKDSDRGRHMKGFDRLKVRDPEALLVQEDLETLSQE EEVEDDKSDGFVTEVVRKKSFDMPPLSVNEAIEQLENVDHDFYGFRNEETGEINIVYR RKEGGYGLIIPKEDGKTEKLEPLEVEPEKEPSIAE" misc_feature 256..564 /note="Sigma 54 modulation protein domain" BASE COUNT 281 a 170 c 232 g 253 t ORIGIN 1 atggcgactc tttccctttc cccttccgtg ggaacaactt ttcactctct ccatagctac 61 ccaaatggtt cctcatccta ttcttcttct tgtcccgcta ctgcttctcc agctttgtca 121 ctgacattgt catctaccaa ttcacgattt ttaaattcag ctttcaagat gaatgaaatt 181 aatgttcctg tcaggaatag ggtgacaaaa tcctttgggg tccggatgtc ttgggatggt 241 ccactttctt ctgttaaact cattcttcaa gggaaaaatc ttgagttaac acctgctgta 301 aaggactatg tggaagagaa gttgggtaag gcagttcaaa agcacagcca tctagccagg 361 gaagtggatg ttaggctgtc tgttcgaggt ggagagcttg gaaaaggccc aaaaattcga 421 agatgtgaag ttactctatt tacgaaaaag catggagtga ttcgtgcaga ggaagacggt 481 gagtcaattt atggaagtat agatatggta tcatcaatta tacagagaaa gttgcggaaa 541 attaaggaga aggattcaga ccgtggtcgc cacatgaagg gcttcgatag gctgaaagtc 601 agggacccag aggcgctgtt agttcaagag gatcttgaaa cactttccca agaggaagaa 661 gttgaagatg acaagagtga tggctttgtt actgaggttg ttcgtaagaa gtcctttgac 721 atgccacctt taagtgtcaa tgaagcaatt gaacagctgg aaaatgtcga ccatgacttc 781 tatggtttcc ggaatgagga aactggtgag attaacatcg tttacagacg aaaagaaggg 841 ggttatggac ttattattcc aaaggaagat ggtaaaacag agaagttaga gcccttggag 901 gttgaaccag agaaagaacc gtcgatagca gaataa //

149 LOCUS 627 bp mRNA linear PLN 08-MAR-2004 DEFINITION Alpha/beta fold family protein. ACCESSION AY568723 KEYWORDS SOURCE tomato. ORGANISM Lycopersicon esculentum Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; asterids; lamiids; Solanales; Solanaceae; Solanum; Lycopersicon. REFERENCE 1 (bases 1 to 627) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Identification, accumulation, and functional prediction of novel tomato transcripts systemically up-regulated after fire damage JOURNAL Unpublished REFERENCE 2 (bases 1 to 627) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Direct Submission JOURNAL Submitted (08-MAR-2004) Botany, North Carolina State University, Gardner Hall, Campus Box 7612, Raleigh, NC 27695, USA FEATURES Location/Qualifiers source 1..627 /organism="Lycopersicon esculentum" /db_xref="taxon:4081" CDS 1..627 /codon_start=1 /product="Alpha/beta fold family protein" /translation="MVNLVEAQKPLLHGLMKLAGIRPHSIEIEPGTIMNFWVPSETII QKTKKNKKITTTTPLSNNQYAISPDSTTEPDPNKPVVVLIHGFAGEGIVTWQFQIGAL TKKYSVYVPDLLFFGGSVTDSSDRSPGFQAECLGKGLRKLGVEKCVVVGFSYGGMVAF KMAEMFPDLVEALVVSGSILAMTDSISTTTLNGLGIFIFFGAAAAYLC" misc_feature 235..504 /note="Alpha/beta hydrolase domain" misc_feature 238..306 /note="Predicted transmembrane region" misc_feature 439..498 /note="Predicted transmembrane region" misc_feature 553..621 /note="Predicted transmembrane region" BASE COUNT 181 a 129 c 145 g 172 t ORIGIN 1 atggtgaact tggttgaagc acaaaaacca ttgttacatg gcctaatgaa attagctgga 61 atcagacctc atagtataga gatagaacca ggcacaatta tgaatttttg ggttccttct 121 gaaaccataa ttcaaaaaac gaagaaaaac aaaaaaatca caaccactac tcctctctcc 181 aacaaccaat atgctatttc ccctgattcc accaccgaac ccgacccgaa caaacccgtg 241 gtcgtactaa tccacggctt tgccggcgaa ggaatagtga cgtggcaatt tcaaatcggt 301 gcattaacta aaaaatactc tgtttatgta ccggacctac ttttcttcgg cggatcagtt 361 acggatagct ccgatagatc gccgggtttt caagcagagt gtttgggtaa agggctgagg 421 aaattaggcg tggaaaaatg cgtagtggtt ggatttagtt atggaggaat ggtggcgttt 481 aagatggcgg aaatgtttcc agatttagtt gaggcgttgg tggtgtctgg atcgatatta 541 gcgatgactg attccattag cactaccacg ctcaatggtt tggggatttt catcttcttc 601 ggagctgctg ctgcctacct ctgttaa //

150 LOCUS 453 bp RNA linear PLN 08-MAR-2004 DEFINITION Histidine triad family protein. ACCESSION AY568724 KEYWORDS SOURCE tomato. ORGANISM Lycopersicon esculentum Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; asterids; lamiids; Solanales; Solanaceae; Solanum; Lycopersicon. REFERENCE 1 (bases 1 to 453) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Identification, accumulation, and functional prediction of novel tomato transcripts systemically up-regulated after fire damage JOURNAL Unpublished REFERENCE 2 (bases 1 to 453) AUTHORS Coker,J.S., Vian,A. and Davies,E. TITLE Direct Submission JOURNAL Submitted (08-MAR-2004) Botany, North Carolina State University, Gardner Hall, Campus Box 7612, Raleigh, NC 27695, USA FEATURES Location/Qualifiers source 1..453 /organism="Lycopersicon esculentum" /db_xref="taxon:4081" transit_peptide 1..90 /note="Predicted transit peptide (unknown location)" CDS 1..453 /note="Involved in cell cycle regulation." /codon_start=1 /product="Histidine triad family protein" /translation="MIVRRKTPALKVYEDDVCLCILDANPLCFGHSLVIPKSHFTSLQ ETPSSVVAAMSSKLPLISSAVMKATGCDSFNLLVNNGAAAGQVIYHTHIHIIPRKASD CLWTSETLSRCPLKSDEAQKLADGIRENLSISSNIEDSKGQGSSLVVN" misc_feature 4..315 /note="Histidine triad family domain" BASE COUNT 127 a 93 c 100 g 133 t ORIGIN 1 atgattgttc gacgtaaaac acctgcttta aaggtctatg aggatgatgt atgcctctgc 61 attttggatg caaacccatt gtgttttggg cactcgcttg tcatcccaaa gtctcatttt 121 acttctttgc aagaaactcc atcatcagtt gtggctgcca tgagttcaaa attgcccttg 181 attagcagtg cagtcatgaa agccactggt tgtgattcgt tcaacttgtt agttaacaac 241 ggggcagcag ctggccaggt tatatatcat acccacattc atataattcc tcgtaaagca 301 agcgattgcc tctggacttc tgagacctta agtagatgtc cgctgaagtc agacgaggct 361 cagaaacttg cagatggtat tagagaaaac ttatcaattt cgagcaacat tgaagatagt 421 aaggggcaag gatcaagtct cgttgtaaac tag

151

Appendix 3: Perspectives on student research experiences in plant biology

Overview

Student research is a vital part of our national science education infrastructure and the essence of inquiry-based learning. Research training efforts in plant biology are especially important now because plants are central to unprecedented 21st century challenges such as world food supply, environmental protection, and genetic modification. A series of surveys have been administered to the American Society of Plant Biologists to understand the following aspects of student research experiences: the extent of participation among plant biologists (and subgroups therein) in training student researchers, the advantages and disadvantages of research training, the effectiveness of various training techniques, and the mentor perception of institutional incentives. Overall, 89% and 49% of potential mentors have supported undergraduates and high school student researchers, respectively, and the average plant biologist trains 1.3 undergraduates and 0.3 high school students per year. Time efficiency seems to be the most important issue for mentor participation and success. For example, the vast majority of disadvantage comments involve the time spent by senior researchers training students, the time restraints of students, and the effects of training students on lab productivity. Similarly, many respondents who report great successes with young researchers mention strategies for saving time, maximizing productivity, and utilizing resources wisely. Even though the vast majority of plant biologists find that mentoring student researchers is rewarding, only 49% and 17% perceive institutional incentives for working with undergraduates and high school students, respectively. To assess educational outcomes from the student perspective, another set of surveys have been administered to undergraduate students in the Botany Department at N.C. State University. Positive educational outcomes which rated especially high included a greater appreciation for teaching/research, greater initiative towards pursuing a career, an increase in skills, and greater consideration for attending graduate school. Students found that the experiences were effective at building 5 “leadership skills” which included team-work, problem-solving, getting along with others, analytical skills, and time-management, and somewhat effective at developing 4 others which included writing, speaking, work ethic, and integrity.

152

153

154

155 156

A National Perspective on Mentoring Student Researchers in Plant Biology

Jeffrey S. Coker and Eric Davies

Abstract

Student research is a vital part of our national scientific infrastructure. We have surveyed

the American Society of Plant Biologists to measure participation levels and to

understand mentor perspectives on student research experiences in the plant sciences.

Overall, the average numbers of undergraduates and high school students mentored per

year by plant biologists are 1.3 and 0.3, respectively. Whereas most mentors oversee

undergraduates regularly, 9% of the mentoring population hosts more high school

students than the other 91%. Only 49% and 17% of plant biologists perceive institutional

incentives for mentoring undergraduate and high school student researchers, respectively.

The numbers of students mentored per year and the percentages of mentors perceiving

institutional incentives vary according to institutional type and academic rank, while

faculty at primarily undergraduate institutions host more undergraduates per year (1.9)

and perceive more institutional support (65%) for undergraduate research than those from all other institutional types. The highest-ranked institutional incentives were funding, academic credit for both student and mentor, and consideration in promotion/tenure decisions. Over 90% of mentor comments involving successful training techniques fell into one of three categories: designing a project that is simple and has clear goals, providing hands-on supervision, and ensuring good communication and explanations.

157 Introduction

Over the next century, plant biologists will be faced with at least 3 challenges of great importance to humanity: ensuring a sustainable food supply for over 10 billion people; developing safe, responsible practices with regard to genetically-modified organisms; and protecting the planet’s flora from a multitude of threats to the environment. A substantial group of highly trained scientists will be needed to deal with the complexity and breadth of these challenges. Accordingly, we have begun to assess research training in plant biology on a national level to understand what reforms might improve it in terms of both training practices and institutional support. We define research as any investigation that attempts to make an original intellectual or creative contribution to a discipline. Despite our focus on the plant sciences, we anticipate that our findings will be applicable to all areas laboratory-based research.

Research training at the high school and undergraduate level has direct implications on the national economy. Although scientists and engineers in research and development (R & D) constitute less than 1% of the total U.S. workforce, they drive innovations and new technologies that improve health, agriculture, environment, consumer products, and national defense. As the National Science Foundation (NSF) published, “The funding and conduct of R&D has always been viewed as essential to the

Nation” (NSF, 2000). The United States spent $265 billion on R & D in the year 2000 alone, $30 billion of which was spent by academic institutions (NSF, 2002). In this context, funding agencies such as NASA and the NSF are expanding student research opportunities (Service, 2002) and considering “integration of research and education” as

1 of 4 criteria to review research grants (NSF, 2003). Furthermore, a number of national

158 organizations have recommended the expansion and improvement of efforts to include students in college/university research (Sigma Xi, 1989; NSF, 1996; Howard Hughes

Medical Institute, 2002; Council on Undergraduate Research, 2003). The first recommendation of the Boyer Commission on Educating Undergraduates in the Research

University (1998) was to “make research-based learning the standard”.

Given the high level of investment and emphasis, studies assessing the prevalence and/or quality of research training experiences are surprisingly scarce. The vast majority of “assessment” involves the self-assessment of particular programs for major funding agencies. Although self-assessments are important for improving individual programs and training techniques, it is difficult to make comparisons between them and they may not have much value outside the funding agency. Furthermore, these assessments give few clues about the research training habits of mentors without major funding (a large proportion of mentors). Thus, the quantity and quality of student research experiences on a national level remain uncertain. Such uncertainty should be a concern since the number of scientific/technical articles published by American biologists dropped 22% between

1992 and 1999 (National Science Foundation, 2002).

Most of the published literature on research training, though rich in examples of specific programs, has similar limitations. Over the last decade, publications on research training have fallen into three major categories: non-specific large-scale accounts

(Seago, Jr. 1992; Austin 1997; Schowen 1998; Druger 1998; Craig 1999; Levesque and

Wise 2001), descriptions of particular programs/courses (Ortez 1994; Heppner 1996;

Nikolova Eddins et al. 1997; Chaplin et al. 1998; Krasny 1999; McLean 1999; Henderson and Buising 2000; Boersma et al. 2001; Hutchison et al., 2002), and descriptions of

159 particular training methods (Beer 1995; Durso 1997; Lewis et al., 2002; Griffin et al.,

2003). Recently, there have also been several surveys of larger populations of student researchers in chemistry and biology (Mabrouk and Peters, 2000), psychology (Landrum and Nelsen, 2002), and medicine (Solomon et al., 2003), as well as a survey of mentors in plant biology (Coker and Davies, 2002) and an institutional survey of liberal arts colleges

(Research Corporation, 2001). In the study presented here, we synthesize data from mentors in order to understand the plant biology research training landscape in broad terms.

We have administered a series of surveys via the Education Committee of the

American Society of Plant Biologists (ASPB; www.aspb.org). The ASPB is a professional organization that promotes the interests of plant scientists and publishes two well-respected journals, Plant Physiology and The Plant Cell. Its members work in a

diverse array of government, industry, and academic environments across six continents

and are largely representative of laboratory-based plant biology researchers, although

most are college and university faculty in the United States.

Our first survey found that 89% and 49% of ASPB members have supported

undergraduate and high school student researchers, respectively (Coker and Davies,

2002). Respondents discussed the advantages and disadvantages of supporting student

researchers (Coker and Davies, 2002). In the current report, we present findings from a

second survey involving the numbers of undergraduate and high school students

mentored and mentor perceptions of institutional support. Respondents also indicated

which training techniques, research project ideas, and institutional incentives they have

found to be effective and ineffective.

160 Materials and Methods

Data collection

Two mass emails were sent via the ASPB Education Committee to everyone on the society’s membership list (fall 2001). These emails briefly explained the survey and referred members to a website where the survey was posted. We received responses from about 7% of the society (338 members out of around 5,000). It seemed possible that our sample population was a biased subgroup of the whole society and may contain a disproportionate number who are especially interested in student research. To explore this possibility, we administered the quantitative portion of the survey to 50 ASPB members at random at the national meeting in Denver (2002).

The survey instrument contained several questions that have unknown, yet quantifiable, answers regarding student research mentoring.

• How many years have you been in a position where you could host high school and/or undergraduate researchers? A) 0-5 B) 5-10 C) 11-15 D) 16-25 E) Over 25 • About how many high school researchers have you supported in your career (if any)? A) 0-5 B) 5-10 C) 11-15 D) 16-25 E) Over 25 • About how many undergraduate researchers have you supported in your career (if any)? A) 0-5 B) 5-10 C) 11-15 D) 16-25 E) Over 25 • Does your institution give any incentives for you to mentor undergraduate/high school research? Undergraduate Yes / No High school Yes / No

It also contained five open-response items asking for the following:

• Training techniques or research projects that work well with beginning researchers • Training techniques or research projects that do not work well with beginning researchers • Institutional incentives for mentoring student researchers which are most effective • Institutional incentives for mentoring student researchers which are least effective • Incentives not available but which would be appealing to mentor, student, or administration

Finally, several questions allowed us to gather the following demographic information for correlative analysis (Table 1):

161

• Type of institution • Position at that institution • Gender • Ethnicity • Country of residence

Data were grouped and summarized for correlative analysis relative to numbers of

undergraduates mentored (Table 2) and perceptions about institutional support (Table 3).

Quantitative analyses

The survey data are discrete (not continuous) in that each respondent is classified

based on the various categories in Table 2 and 3, leading us to test hypotheses using chi

squared statistics. We constructed r x c tables to calculate a chi-squared statistic to test

each of our null hypotheses. Our null hypotheses were always that there is no

relationship between the row and column categories in Tables 2 and 3. Following

standard convention, chi squared values corresponding to a probability of .05 or less were considered significant.

For example, we tested the null hypothesis “The number of undergraduates mentored is independent of the type of institution” by building an r x c table with r (rows) being types of institutions and c (columns) being categories of number of undergraduates mentored. We then calculated a chi squared value for each box in the table based on its deviation from an expected value (based on the total number in that column divided by the overall total). After finding that the sum of all chi squared values in the table corresponded to a p-value less than .05, we concluded that we “Can reject the null hypothesis”, or rather, “The number of undergraduates trained could be dependent on the type of institution.” We then proceeded to test more specific hypotheses, such as “The

162 number of undergraduates mentored is independent of the mentor being an academic and

non-academic.” Our final conclusions were based upon r x c tables that had estimated

values in each box of at least 5, since lower values can skew chi squared statistics. Thus, we do not make separate conclusions about mentors from government, industry, and the private sector because of low sample sizes. Instead, we pooled these groups into a “non- academics” category. Also, we cannot make conclusions involving mentor nationality or ethnicity.

Qualitative analyses

Each open-response item was analyzed by grouping similar comments together and counting the total number of comments in each group. If a respondent mentioned multiple techniques/incentives in the same comment, the comment was included in the count for all applicable groups. Comments were ranked according to their total counts.

163 Results and Discussion

Demographics

The majority of respondents were professors at colleges and universities in the

United States (Table 1). About 65% of the whole sample worked at either land-grant or other research universities, whereas 20% worked at primarily undergraduate institutions

(PUIs). Those representing PUIs were mostly from liberal arts colleges (85%). Most of those from academia were either assistant, associate, or full professors, although a few post-docs and graduate students (nearly all from universities) also completed the survey.

Respondents not in academia (13%) were a mix of researchers from government, industry, and the private sector.

The length of time respondents had been in a position to mentor is well- distributed. A little more than half (56%) were in the first 10 years of their mentoring career, whereas the rest were evenly distributed between 11-15, 16-25, and over 25 year categories. In general, those in academia were evenly spread throughout these categories, whereas non-academics were more frequently in the 0-5 year category.

The vast majority (74%) worked in the United States, whereas the remaining 26% were from 26 different countries on 6 continents. Most were Caucasian (68%) or Asian

(11%) in descent. Despite the international diversity of the sample, diversity was relatively low with respect to Hispanics and Latinos (4.1%), as well as African-

Americans (0.3%).

164 Sampling concerns

To ensure that respondents to the online survey were not a biased subgroup of the

ASPB membership, we administered the quantitative portion to 50 ASPB members at random at an ASPB national meeting. Because these randomly administered surveys gave statistics that were very similar to those from the online survey (data not shown), we conclude that the online survey did, indeed, provide a representative sample of the whole population.

Numbers of undergraduates trained

The percentages of respondents who have mentored various numbers of undergraduates are shown in Table 2. Overall, 30%, 22%, 14%, 13%, and 22% of respondents mentored 0-5, 5-10, 11-15, 16-25, and >25 undergraduates, respectively

(Table 2). In Table 2, percentages are also shown for various respondent groups including type of institution, academic position, years in a position to mentor, number of high school students mentored, and gender. For purposes of comparing groups, the total number of students mentored is less important than the rate of mentoring (the number of students per mentor year).

In order to estimate the number of undergraduates trained per mentor, we assume that the average response in any given category is equal to the middle of that category’s scale (i.e. someone who marked the “5-10” category has been a potential mentor for 7.5 years). To be conservative, we used 1.5 students as the average number in the “0-5” category for number of undergraduates mentored, and estimated that those who have mentored “Over 25” students average 42 students over the whole population (derived

165 from a linear extrapolation). This number comes from an assumption (and subsequent linear extrapolations) that those who reach the “Over 25” student category in a given length of time will continue to mentor at the same rate for the rest of their career. We also assume that those who have been a potential mentor for “Over 25” years average 30 years. Based on these estimations, we calculate the average number of undergraduates per mentor to be 15 (Table 2). Since the average length of mentoring career in the population was 12 years, we conclude that the average number of undergraduates per mentor year is about 1.3 (Table 2).

The relationships between length of mentoring career and the number of undergraduates trained are shown in Figure 1. In general, most mentors seem to be active throughout their careers, as shown by the steady increase in the numbers of undergraduates trained over time (Fig. 1). Another overall trend is that, within any

“length of mentoring career” category, there is significant variation in the numbers of students trained (Fig. 1; also reflected by error bars in Fig. 2 and 3). This is likely due to the broad spectrum of job responsibilities (and motivations) across the research community.

As one would expect, the total number of undergraduates mentored increases with higher academic positions. On the average, graduate and post-doctoral students mentored

2.2 and 3.7 total students (Fig. 2), respectively, which equates to 0.6-0.7 students per mentor year (Table 2). Assistant, associate, and full professors mentored 10, 19, and 27 total students, respectively (Fig. 2). Whereas there is no significant difference between the mentoring rates of assistant and associate professors (1.9-2.0 undergraduates per year), the rate of full professors is significantly lower (1.3 undergraduates per year). This

166 seems to indicate less involvement in research mentoring among full professors. There

are several possible explanations for this trend, including a shift toward administrative or

teaching duties over time, a drop in productivity among faculty after tenure, being more

selective in recruiting students, and a realization over time that mentoring is not rewarded

professionally. Another explanation could be, as one respondent said, “Since they

(student researchers) are extremely time-intensive and spend at least their share of supply money from my grant, I have had fewer as my career progresses.” On the other hand, it is also possible that student research has been emphasized more in recent years, making the overall mentoring rate higher among the younger faculty even though full professors may have mentored as many students recently.

The type of institution had a significant effect on the numbers and rates of undergraduates trained. Current faculty at universities averaged 1.3 undergraduates per year, while faculty at PUIs averaged 1.9 (Table 2). This difference results from 3 factors: full professors at PUIs mentoring slightly more than those at universities (Fig. 2), full professors making up a higher percentage of the total population than assistant or associate professors (Table 1), and PUI professors having been in a position to mentor 5-

6 years less than those at universities (Table 2) which leads to a higher mentoring rate.

Researchers from government agencies and industry averaged 0.6-0.7 undergraduates per year. Thus, the data indicate that the average PUI professor mentors around 45% more undergraduates on average than faculty at universities and 300% more than researchers in government/industry. Although we did not necessarily expect these results, they are understandable given the greater focus of PUIs on undergraduate education and the time that university professors must spend directing graduate and post-doctoral students (who,

167 in turn, also mentor undergraduates). However, these findings are supported by a recent study of 136 liberal arts colleges which found that the number of students engaged in some type of research rose 70% in the past decade (Research Corp., 2001).

Number of high school students trained

Using the same logic as above, we estimate the number of high school student researchers per mentor to be 3.6, and the average number per mentor year to be about 0.3.

Unlike undergraduate research mentoring, high school mentoring is driven by a small group of mentors. Over half of high school researchers who are mentored by professional plant biologists are guided by 9% of the mentor population. Furthermore, the medians of all “length of mentoring career” categories correspond to 0-5 high school students. For example, 96% of mentors in the 0-5 years category, as well as 62% in the

25+ years category, have trained 5 or fewer high school students.

What are the most prolific mentors doing?

Around 10% of mentors trained over 25 undergraduates in their first 10 years

(Fig. 1), suggesting that the most prolific mentors train at least 2-3 undergraduates per year and could train nearly 100 students in their career. A few mentors make significant efforts to train both undergraduate and high school students. For example, 10% of 25+ year mentors have trained over 25 high school students, and 4 of these 6 individuals also mentored more than 25 undergraduates. One respondent commented, “I train 3 to 12 undergraduate students each semester. Seven have been coauthors on published papers

168 since 1997.” On the other hand, a typical mentor trains 1-2 undergraduates per year and

0 high school students.

We wish to be clear here that numbers of students say nothing about the quality of student research experiences or the duration of those experiences. Whereas getting more students involved in research is advantageous, increasing the number of students trained can be counterproductive if the quality of experiences decline in the process. This being said, we feel certain that many mentors are doing an excellent job with a few students, while others are doing an excellent job with many students.

Effective training techniques and research projects

Over 90% of respondent comments regarding effective training techniques fell into one of the following three categories, each of which had subcategories:

1. Design a project that is simple and has clear goals (38%). a. Well-structured b. Uses a technique(s) common to the lab c. Uses a single technique or set of techniques. d. Achievable in short amount of time. 2. Provide hands-on supervision (27%). a. Partner inexperienced students with experienced students. b. Have students work in teams. 3. Ensure good communication and explanations (27%). a. Explain theory, background, and context b. Provide clear, written directions c. Be available to listen and answer questions.

For the most part, comments were remarkably consistent, and many respondents actually mentioned all three major points. There was only one detectable difference between the comments from various categories of respondents, which involved providing hands-on supervision. For researchers at universities, hands-on supervision equated to working

169 with a graduate student or post-doc (almost every comment said this). For those at PUIs, on the other hand, it either meant direct supervision by a professor, working in teams, or partnering inexperienced undergraduates with experienced undergraduates. It has been suggested that the traditional roles of undergraduates, graduate students, and post-docs are blurring because lower-level students are participating more in research and higher- level students are demanding to be better mentored (González, 2001). Responses to our survey suggest that the roles between upper-level students and faculty may also be blurring, at least in the context of research mentoring, since upper-level students are often expected to mentor lower-level students.

Respondent comments about particular research projects that work well were largely a reflection of their own lab work. A wide array of techniques and projects were mentioned including PCR, DNA sequencing, morphology, histology, mutant screens, enzyme assays, protein purification, cloning, dye uptake into xylem, computer-based projects, etc. At the same time, no particular technique or project was mentioned significantly more than others. The message seems to be that students can successfully perform just about any technique for a research project, so long as the technique is standard in the lab and can be properly overseen. Some respondents also expressed a preference for techniques which are inexpensive and relevant to answering many different research questions.

170 Ineffective training techniques and research projects

The ineffective training techniques identified by respondents were the opposites

of those they had identified as effective. Again, over 90% of comments fell into three

classes:

1. Projects that are not simple and lack clear goals (53%). 2. No hands-on supervision (27%). 3. Poor communication and explanations (11%).

Similarly, the subcategories for ineffective techniques were the opposites of those

mentioned above for effective training techniques. In describing ineffective techniques

and projects, respondents mentioned project design 15% more, and communication 16%

less, than they had when mentioning effective techniques. Other notable ineffective

techniques mentioned by several respondents included using students as technicians (to do nothing but routine tasks) and attempting to mentor students on projects not directly related to other work in the lab.

The common theme for all ineffective training techniques, whether during project design, in the lab, or in providing explanations and interpreting data, was that passive mentoring does not work. In other words, pointing undergraduate or high school students in a general direction and leaving them to figure things out on their own tends to fail.

Effective mentoring, on the other hand, involves an active process of project design, goal-

setting, hands-on training, and guidance.

There was no consensus on particular laboratory projects that are ineffective for

training student researchers. In fact, nearly all that were mentioned as being ineffective

had been mentioned by someone else as a potentially effective project. The one

exception involved the use of radioactivity-based techniques during student projects,

171 which several respondents discouraged and another mentioned as “illegal” for those 18 and younger in Australia.

Perceptions of institutional incentives

Table 3 shows the percentages of respondents who think their institution gives incentives for mentoring student research. Overall, about one-half (49%) perceived institutional incentives for mentoring undergraduates, and only one-sixth (17%) perceived incentives for mentoring high school students. The vast majority of those respondents with incentives for high school mentoring also have incentives for undergraduate mentoring (88%), whereas only 2% of all mentors have incentives exclusive to high school researchers compared to 32% for undergraduates. Only 14% of all mentors perceive institutional incentives for mentoring both undergraduates and high school students, whereas 46% perceive no incentives for either.

Almost 50% of those at land grant and other research universities perceive that there are incentives for mentoring undergraduates (Table 3). If only faculty members at these universities are considered (i.e. excluding graduate and post-doctoral students), the percentage rises slightly to 53% (Table 3). In contrast, significantly more faculty at PUIs perceive incentives for undergraduate research (65%). The discrepancy between university faculty and PUI faculty involving undergraduate-related incentives derives mainly from the responses of full professors and partly from assistant and associate professors (Fig. 3). Seventy percent of full professors and 57% of associate professors at

PUIs responded that they have incentives compared with 55% and 47% at land-grant universities and other research universities, respectively (Figure 3). Although almost

172 equal numbers of assistant professors at “other research universities” (67%) and at PUIs

(69%) perceived they had undergraduate-related incentives, significantly fewer associate

and full professors at “other research universities” (38%) perceived institutional incentives than those at PUIs (55% - Fig. 3). At land-grant universities, the same percentage (55%) of faculty perceived institutional incentives regardless of academic rank (Fig. 3).

Between 16-18% of those at universities felt that there are incentives for mentoring high school students (whether or not graduate and post-doctoral students are included in the percentage). On the other hand, fewer of those at PUIs (8%) felt that they have incentives to mentor high school researchers (Table 3). This trend was supported by several comments by PUI faculty who said that their specific job is to work with undergraduates and not high school students.

Among government employees, the percentage of those perceiving incentives for undergraduate and high school research mentoring were both 54% (Table 3). In industry and the private sector, very few respondents (less than 15%) perceived incentives for mentoring student researchers (Table 3).

As one might expect, within colleges and universities there seem to be significantly more incentives to support undergraduates than high school students. On the other hand, respondents from government perceive little difference between incentives available for mentoring undergraduate and high school research. We were surprised to find that the group most positive about incentives for high school mentoring was from government, but must note that the sample size for this group was relatively low (n=22).

173

Most effective institutional incentives

The ranks of the 3 most effective institutional incentives from the mentor perspective were very clear:

1. Funding 2. Academic credit (for student AND mentor) 3. Consideration in promotion/tenure decisions

Most funding-related comments mentioned general laboratory needs such as supplies, training-costs, stipends, travel, etc. Many noted that funds specifically for students are important, especially in the summer when students tend to need money. Methods of student payment include stipends, scholarships, fellowships, and free tuition. A small number of mentors also mentioned supplementing mentor salary as an incentive.

The second-most effective incentive, academic credit, has two separate elements.

First, students must receive course credit for their work. Next, mentors (who are usually college faculty) should receive teaching credit proportional to the number of students they mentor, and this credit should have practical value in determining overall workloads.

This relates to results from our previous survey, which showed that the biggest disadvantage of mentoring student researchers is that it takes a lot of time (Coker and

Davies, 2002). Most mentors will not commit to many students without gaining time somewhere else in their schedules.

The third-most effective incentive is consideration in career advancement decisions. We suspect that part of the reason why assistant professors tend to perceive more institutional incentives than their more experienced colleagues (Table 3) is that they have not yet been through the tenure-review process and thus do not realize how little

174 they may be rewarded for mentoring student researchers. “Institutionalizing” student research must include appropriate acknowledgement for those who put forth considerable mentoring effort.

Other incentives mentioned include the following: co-authorship of papers/posters/presentations by students, being in a culture where research mentoring is expected, having students add to research productivity, administrative support for practical arrangements (housing, parking, etc.), involvement with an honors program, and the expectation that it is (or will become) necessary for research funding.

Least effective institutional incentives

Mentors did not have a unified voice on particular incentives that do not work.

Nearly every comment dealt with a poor method of implementing or enforcing an incentive that would have been effective otherwise. The common theme of all comments was that mission statements, administrative encouragement, and departmental expectations are not effective in the absence of more tangible institutional incentives.

The most desirable institutional incentive not currently available

When asked to list incentives that are unavailable but which would be appealing to mentor, student, or administration, all three of the “most effective” incentives listed above were mentioned frequently. However, the most mentioned incentive was for mentors to receive teaching credit for mentoring student researchers. The following comments represent this opinion:

175 Academic credit would be most useful and provide for the time necessary for such a worthwhile undertaking.

We receive no "credit" for taking undergraduates in lab. Our work loads are calculated based on course loads, and although undergraduates who do year-long thesis projects with us do sign up with us as taking a course, we receive no credit for teaching such a course.

Recommendations for future studies

It is important to emphasize that the number of students mentored by an individual mentor does not indicate the quality and overall value of those experiences. In fact, a highly effective mentor could have a greater student retention rate which results in fewer total students than a less effective mentor. For this reason, we caution that individual mentors should not be evaluated based solely on numbers of students trained.

Student outcomes, nature and length of student research projects, and other quality measures should be taken into consideration along with numbers of students trained.

Thus, we suggest that future surveys attempt to measure both the quantity and quality of research experiences.

Finally, longitudinal data on student research would allow more informed decisions to be made by teachers, researchers, and administrators. Only with knowledge of long-term trends can research experiences be optimized and evaluated on a national scale. Therefore, we think that replicating this study in 10-20 years would have substantial value.

Acknowledgements

We thank members of the ASPB for their cooperation in filling out surveys and Sophia

Clotho for her advice.

176

References

Austin, C.A. (1997). A survey of final-year undergraduate laboratory projects in biochemistry and related degrees in Great Britain. Biochem. Educ. 25, 12-14.

Beer, R.H. (1995). Guidelines for the supervision of undergraduate research. J. Chem. Educ. 72, 721-722.

Boersma, S., M., Hluchy, G., Godshalk, J., Crane, D., DeGraff, and Blauth, J. (2001). Student-designed interdisciplinary science projects. J. Coll. Sci. Teach. 30, 397-402.

Boyer Commission on Educating Undergraduates in a Research University. (1998). Reinventing undergraduate education: A blueprint for America’s research universities. . Accessed 11 July 2003.

Chaplin, S.B., Manske, J.M., and Cruise, J.L. (1998). Introducing freshmen to investigative research – A course for biology majors at Minnesota’s University of St. Thomas. J. Coll. Sci. Teach. 27, 347-350.

Coker, J.S., and Davies, E. (2002). Involvement of plant biologists in undergraduate and high school student research. J. Nat. Resour. Life Sci. Educ. 31, 44-47.

Council on Undergraduate Research. (2003). The Council on Undergraduate Research. . Accessed 11 July 2003.

Craig, N.C. (1999). The joys and trials of doing research with undergraduates. J. Chem. Educ. 76, 595-597.

Druger, M. (1998). Teaching versus research – An ongoing issue at the college level. J. Nat. Resour. Life Sci. Educ. 27, 134-135.

Durso, F.T. (1997). Corporate-sponsored undergraduate research as a capstone experience. Teaching of Psychology 24, 54-56.

González, C. (2001). Undergraduate research, graduate mentoring, and the university mission. Science 293, 1624-1626.

Griffin, V., McMiller, T., Jones, E., and Johnson, C.M. (2003). Identifying novel helix- loop-helix genes in Caenorhabditis elegans through a classroom demonstration of functional genomics. Cell Biol. Educ. 2, 51-62.

Henderson, L., and Buising, C. (2000). A research-based molecular biology laboratory. J. Coll. Sci. Teach. 30, 322-327.

Heppner, F. (1996). Learning science by doing science. Am. Biol. Teach. 58, 372-374.

177

Howard Hughes Medical Institute. (2002). Undergraduate science education at research universities. . Accessed 11 July 2003.

Hutchison, A.R., and Atwood, D.A. (2002). Research with first- and second-year undergraduates: a new model for undergraduate inquiry at research universities. J. Chem. Educ. 79, 125-126.

Krasny, M.E. (1999). Reflections on nine years of conducting high school research programs. J. Nat. Resour. Life Sci. Educ. 28, 17-23.

Landrum, E.R., and Nelsen, L.R. (2002). The undergraduate research assistantship: an analysis of the benefits. Teaching of Psychology 29, 15-19.

Levesque, M.J., and Wise, M. (2001). The Elon experience: Supporting undergraduate research across all disciplines. CUR Quarterly, Mar, 113-116.

Lewis, J.R., Kotur, M.S., Butt, O., Kulcarni, S., Riley, A.A., Ferrell, N., Sullivan, K.D., and Ferrari, M. (2002). Biotechnology apprenticeship for secondary-level students: Teaching advanced cell culture techniques for research. Cell Biol. Educ. 1, 26-42.

Mabrouk, P.A., and Peters, K. (2000). Student perspectives on undergraduate research experiences in chemistry and biology. CUR Quarterly, Sept, 25-33.

McLean, R.J.C. (1999). Original research projects – A major component of an undergraduate microbiology course. J. Coll. Sci. Teach. 29, 38-40.

National Science Foundation. (1996). Shaping the future: New expectations for undergraduate education in science, mathematics, engineering, and technology. . Accessed 11 July 2003.

National Science Foundation. (2000). Science and engineering indicators-2000 (NSB 00-1). . Accessed 11 July 2003.

National Science Foundation. (2002). Science and engineering indicators-2002 (NSB 02-1). . Accessed 11 July 2003.

National Science Foundation. (2003). Grant proposal guide (NSF 03-041). . Accessed 11 July 2003.

Nikolova Eddins, S.G., Williams, D.F., Bushek, D., Porter, D., and Kineke, G. (1997). Searching for a prominent role of research in undergraduate education: Project Interface. J. Excellence in College Teaching 8, 69-81.

178 Ortez, R.A. (1994). Investigative research in nonmajor freshman biology classes. J. Coll. Sci. Teach. 23, 296-300.

Research Corporation. (2001). Academic Excellence: The Sourcebook. . Accessed 11 July 2003.

Schowen, K.B. (1998). Research as a critical component of the undergraduate educational experience. K.B. Schowen (Ed.), Washington, D.C.: National Academy Press. pp 73-81.

Seago, J.L., Jr. (1992). The role of research in undergraduate instruction. Am. Biol. Teach. 54, 401-405.

Service, R.F. (2002). New lure for young talent: extreme research. Science 297, 1633- 1634.

Sigma Xi. (1989). An exploration of the nature and quality of undergraduate education in science, mathematics and engineering. A report of the National Advisory Group of Sigma Xi, The Scientific Research Society.

Solomon, S.S., Tom, S.C., Pichert J., Wasserman, D., and Powers, A.C. (2003). Impact of medical student research in the development of physician-scientists. J. Investig. Med. 51, 149-156.

179 Table 1. Population demographics of respondents to a survey of the American Society of Plant Biologists (ASPB).

% of Category pop.

Land-grant university 41.0 Other research university 24.2 Primarity undergraduate institution 20.1 Government 6.5 Industry 3.8 Institute, museum, private organization 2.4 Unknown 2.1

Full professor 31.4 Assistant professor 17.5 Associate professor 16.6 Post-doc 9.8 Graduate student 8.3 Research director 5.0 Research scientist 3.8 Lab manager 3.6 Retired professor 2.1 Other 1.2 Unknown 0.9

0-5 yrs in a position to mentor 30.0 5-10 yrs 25.5 11-15 yrs 14.8 16-25 yrs 14.2 Over 25 yrs 15.4

United States 74.3 Canada 4.1 Japan 2.7 Germany 2.4 Australia 1.5 Mexico 1.2 Taiwan 1.2 United Kingdom 0.9 Portugal 0.9 18 other countries 7.1 Unknown 3.8

Caucasian 67.5 Asian 10.7 Hispanic/Latino 4.1 African-American 0.3 Other 0.6 Unknown 16.9

Females 31.6 Males 64.6 Unknown 3.8

180 Table 2. Percentages of respondents who have mentored various numbers of undergraduates. Estimates of average number of undergraduates mentored and average number of undergraduates per mentor year are based on these percentages. Shaded regions show the most important trends that are supported by reasonable sample sizes (n). UG=Undergraduate researchers; HS=High school researchers.

Estim. avg. Estim. Length of avg. Estim. avg. # # UG per total # UG mentoring UG per n 0-5 UG 5-10 UG 11-15 UG 16-25 UG > 25 UG mentor mentored career (yrs) mentor year Overall 339 29.6 % 21.8 % 14.0 % 13.1 % 21.5 % 15 5247 12.0 1.3

Institution Land-grant univ. 139 33.3 18.4 12.8 12.1 23.4 16 2186 12.9 1.2 (only current faculty) 94 16.3 20.2 16.3 16.3 30.8 20 1879 15.4 1.3 Other research univ. 82 22.5 11.3 18.8 23.8 23.8 18 1500 13.1 1.4 (only current faculty) 58 11.3 22.6 11.3 24.2 30.6 21 1216 16.1 1.3 PUIs 68 13.0 24.6 21.7 15.9 24.6 18 1242 9.7 1.9 Government 22 40.9 36.4 13.6 4.5 4.5 8 172 11.6 0.7 Industry 13 69.2 7.8 15.4 0.0 7.8 7 89 12.3 0.6 Inst., mus., priv. org. 8 62.5 25.0 0.0 0.0 12.5 8 65 7.9 1.0

Academic position Graduate student 28 85.2 14.8 0.0 0.0 0.0 2 67 3.8 0.6 Post-doc 33 71.9 21.9 6.3 0.0 0.0 4 116 5.3 0.7 Assistant professor 59 27.1 37.3 13.6 18.6 3.4 10 593 5.3 1.9 Associate professor 56 7.1 23.2 21.4 25.0 23.2 19 1079 9.8 2.0 Full professor 106 4.8 12.4 16.2 17.1 49.5 27 2887 21.3 1.3

Years could mentor 0-5 101 66.0 25.0 8.0 1.0 0.0 4 411 2.5 1.6 5-10 86 20.9 26.7 17.4 24.4 10.5 14 1185 7.5 1.8 11-15 50 16.0 20.0 18.0 18.0 28.0 19 968 12.5 1.5 16-25 48 8.3 16.7 14.6 16.7 43.8 25 1196 20.0 1.2 Over 25 52 7.7 13.5 15.4 9.6 53.8 28 1435 30.0 0.9

# of HS mentored 0-5 261 32.8 23.6 14.3 13.9 15.4 13 3471 10.4 1.3 5-10 36 16.7 13.9 25 8.3 36.1 21 765 17.6 1.2 11-15 15 20 20 0 13.3 46.7 24 361 19.0 1.3 16-25 8 12.5 12.5 12.5 0 62.5 29 232 18.4 1.6 Over 25 8 0 12.5 0 12.5 75 35 280 27.1 1.3

Gender of mentor Male 219 27.4 23.7 13 13 22.8 16 3502 13.1 1.2 Female 107 31.8 18.7 16.8 14 18.7 15 1566 10.1 1.4

181 Table 3. Respondent perceptions of institutional incentives for mentoring student researchers. Shaded regions show the most important trends that are supported by reasonable sample sizes (n). UG=Undergraduate researchers; HS=High school researchers.

% Yes % Yes n for UG for HS Overall 339 49.2 16.5

Institution Land-grant univ. 139 47.5 17.2 (only current faculty) 94 55.3 18.1 Other research univ. 82 48.1 13.0 (only current faculty) 58 51.7 10.7 PUIs 68 65.2 8.1 Government 22 54.5 54.5 Industry 13 7.7 7.7 Inst., mus., priv. org. 8 12.5 0.0

Academic position Graduate student 28 38.5 23.1 Post-doc 33 29.0 9.7 Assistant professor 59 67.2 17.3 Associate professor 56 50.0 7.1 Full professor 106 55.2 16.7

Years could mentor 0-5 101 40.8 12.2 5-10 86 47.0 12.0 11-15 50 68.0 28.0 16-25 48 45.8 14.6 Over 25 52 53.8 17.3

# of UG mentored 0-5 99 30.4 15.9 5-10 72 68.5 24.7 11-15 47 46.8 12.8 16-25 44 54.5 6.8 Over 25 72 52.8 12.5

# of HS mentored 0-5 261 49.2 11.7 5-10 36 47.2 25.0 11-15 15 66.7 40.0 16-25 8 75.0 50.0 Over 25 8 25.0 12.5

Gender of mentor Male 219 46.5 18.1 Female 107 53.8 13.1

182

70

60

s 50 nt

nde 40 o p s e

r 30 f o % 20 Over 25 10 16-25 11-15 0 6-10 Undergraduates 0-5 6-10 0-5 mentored 11-15 16-25 Over Length of mentoring 25 career (yrs)

Figure 1. Percentages of plant biologists who mentored various numbers of undergraduates in different “length of their mentoring career” categories. For example, of the plant biologists who were in a position to mentor for 0-5 years, over 60% mentored 0-5 undergraduates.

183

d 40 e

or Land-grant univ.

t 35 n e 30 Other research univ. m

s PUIs

e 25 t a

u 20 d a r

g 15 r

de 10 un l 5 a t

To 0 Grad Post-docs Assistant Associate Full students professors professors professors

Figure 2. Total number of undergraduates mentored by plant biologists of different academic ranks at land-grant universities, other research universities, and primarily undergraduate institutions (PUIs).

90 s

e Land-grant univ.

iv 80 t Other research univ. n e

c 70 PUIs n

l i 60 a n io

t 50 u it t

s 40 in

g 30 n i v i 20 e c r

e 10

% p 0 Grad students Post-docs Assistant Associate Full professors professors professors

Figure 3. Percentages of plant biologists of different academic rank at land-grant universities, other research universities, and primarily undergraduate institutions (PUIs) who perceive institutional incentives for mentoring undergraduate researchers.

184

Evaluation of Teaching and Research Experiences Undertaken by Botany Majors at N.C. State University

Jeffrey S. Coker and C. Gerald Van Dyke Department of Botany, N.C. State University, Raleigh, North Carolina 27695

Abstract

Many science departments require undergraduate students to complete either a teaching or research experience. We have developed a survey instrument to measure outcomes of student teaching and research experiences from the student perspective. Our results in the Botany

Department at N.C. State University show that those doing research are involved mainly in data collection and analysis, whereas those who are teaching are mainly involved with hands- on laboratory instruction. Nearly all students rated their experiences as very good overall and would recommend them to other students. Several positive educational outcomes were rated especially high, including a greater appreciation for teaching/research, greater initiative towards pursuing a career, an increase in skills, and greater consideration for attending graduate school. Students found that the experiences were effective at building 5 “leadership skills” which included team-work, problem-solving, getting along with others, analytical skills, and time-management, and somewhat effective at developing 4 others which included writing, speaking, work ethic, and integrity. Students rated academic-related outcomes relatively low overall, suggesting that motivation to make better grades or to take different courses changed little as a result of research or teaching experiences.

185 Introduction

Experiential learning in the forms of teaching and research can be extremely rewarding for undergraduate students. These experiences allow students to put classroom knowledge into practice and explore potential career paths. Teaching and research settings frequently present rich opportunities to build leadership skills such as team-work, problem-solving, getting along with others, analytical skills, time-management, writing, speaking, work ethic, and integrity. Perhaps most importantly, both teaching and research pose significant, open-ended challenges to students that provide opportunities for high achievement and excellence.

An increased emphasis has been placed on experiential learning in recent years, resulting in a greater need for assessment. Funding agencies such as NASA and the NSF are expanding student research opportunities (Service, 2002) and considering “integration of research and education” as 1 of 4 criteria to review scientific research grants (NSF, 2003). Furthermore, a number of national organizations have recommended the expansion and improvement of efforts to include undergraduates in college/university research (NSF, 1996; Boyer

Commission, 1998; Howard Hughes Medical Institute, 2002). Similarly, the concept of student-assisted teaching has been strongly advocated (Miller et al., 2001), and it is known that most laboratory instruction at U.S. universities is done by teaching assistants (Sundberg and Marshall, 1993).

Recently, there have been surveys of student researchers in chemistry and biology (Mabrouk and Peters, 2000), psychology (Landrum and Nelsen, 2002), and medicine (Solomon et al.,

2003), as well as a national survey of mentors in plant biology (Coker and Davies, 2002) and

186 an institutional survey of liberal arts colleges (Research Corporation, 2001). Previous authors

have also described student research projects in particular courses (Chaplin et al., 1998;

McLean, 1999; Henderson and Buising, 2000). We are unaware of any recent survey of

undergraduate teaching assistants in the sciences which sought to determine educational

outcomes. Nevertheless, the role of graduate teaching assistants in the sciences has been

examined (Druger, 1997; Sundberg et al., 2000), and surveys of teaching assistants have been

performed in communications (Socha, 1998) and sociology (Fingerson and Culley, 2001).

Many science departments nationwide require that students complete an out-of-classroom experience in order to graduate. Undergraduates majoring in Botany at N.C. State University are required to complete either a teaching or research experience as part of the required departmental curriculum. Such experiences include (but are not limited to) laboratory teaching assignments in botany or biology courses, faculty-supervised research, and off- campus internships. We have developed a survey instrument to measure outcomes of teaching and research experiences in the Botany Department at N.C. State. The results were used to determine what students did during research/teaching experiences, the educational

outcomes, the overall success of the requirement, and will be used to improve experiences

and better advise students on which experiences to pursue in the future.

187 Methods

The survey instrument developed for assessing teaching and research experiences consisted of 60 multiple-choice items and 16 open-response items. For those who may be interested in administering similar surveys at their institutions, we have posted this survey at www.cals.ncsu.edu/botany/faculty/gvandyke/undergraduatesurvey.html.

Botany majors at N.C. State University were asked individually to complete the survey after they had finished a research or teaching experience. Most students took about 15 minutes to complete the survey. A total of 25 surveys were completed from the fall of 2002 to the spring of 2004 which included student experiences over a 3-year period (2001-2004). This constitutes most of the students who graduated from the Botany Department over this period.

Results and discussion

Overview of the students

The 25 students who completed surveys were Botany majors with an average GPA of 3.5

(ranging from 2.1 to 4.0). Students major in Botany at N.C. State University for many different reasons. The Botany curriculum is structured to allow students to customize their program to fit career objectives. Student interests include space biology, ethnobotany, pharmaceutical aspects of medicinal plants, plant identification (wetlands, rare and endangered plants, forest plants, grasses, etc.), plant ecology, plant systematics, plant pathology, plant physiology, molecular botany and many others. Some majors may even pursue careers in scientific writing.

188 Overview of research/teaching experiences

Of the 25 students in this survey, 23 had 1 teaching/research experience, 2 had multiple experiences. Nineteen students performed research, 6 taught, and 2 had an experiential internship. Teaching experiences typically involved teaching assistant duties in Introductory

Botany laboratories at N.C. State University. Research was performed in a broad array of settings such as the following: research labs on campus, Syngenta, BASF, Baylor College of

Medicine, Reynolda Gardens at Wake Forest University, the U.S. National Arboretum, national forests, and the U.S. Department of Agriculture.

Typical teaching experiences occupied 7-10 hours per week for 1-2 semesters, and ranged from 3-6 hours per week for 1 semester to 10 hours per week for 3 semesters. Typical research experiences during a school year occupied 10-20 hours per week for 2 semesters, whereas typical summer research experiences were 40 hours per week for the entire summer

(9-12 weeks). The extent of research experiences ranged from 8-10 hours per week for 1 semester to 10 hours per week for 6 semesters (including summer work).

Levels of involvement in specific activities

Figures 1 and 2 show the levels of involvement of students in teaching and research-specific activities, respectively. The most prevalent teaching activities were related to hands-on laboratory instruction, including set-up (3.2), brief presentations (3.8), and other routine tasks

(Fig. 1). Students were somewhat involved in other educational activities such as writing objectives (1.8), developing course material (1.8), writing exams (1.8), and grading exams

189 (2.8). Few to none were involved in traditional professorial duties such as giving full-length lectures (1.2) or performing teaching research (1.0).

Students who participated in research reported being most involved in the attainment of data, including performing experiments, collecting data, and then analyzing data (Fig. 2). Moving from top to bottom along the y-axis of Figure 2 represents a typical progression of activities in a professional research setting. Students reported being somewhat involved in early research stages such as generating hypotheses (2.3) and designing experiments (2.5), and also in late stages such as interpreting results (2.5) and making conclusions (2.2). The lowest- ranking categories were more advanced activities that demand a greater time commitment, especially involvement in the grant process (1.3) and presentation of research (1.1-1.8).

Nevertheless, there was at least some student involvement in all stages of research (Fig. 2).

Effects on leadership skills

General questions asked students about the effectiveness of their teaching/research experiences in helping them to “increase skills,” to “develop leadership skills,” and to “show them the need for developing leadership skills.” Students rated these at 4.2, 4.0, and 4.2, respectively, demonstrating that teaching/research experiences were effective to very effective, in general, at building leadership skills (Fig. 3). In further support of this, student comments regarding skills/rewards gained through a teaching/research experience included many references to leadership skills. Among these were public speaking, time-management, self-organization, working with others, “asking for help,” “experiencing the dynamics of

190 working with other members of the lab on a project,” and “thinking of different ways to

accomplish a goal.”

The survey also contained questions which asked students to rate the effectiveness of their

teaching/research experiences in developing 9 particular leadership skills. Students rated

them as follows: teamwork - 4.0, getting along with others – 4.0, problem-solving - 3.9, time-

management - 3.9, analytical skills - 3.8, speaking - 3.4, writing – 2.9, integrity - 2.9, work

ethic - 2.9. Therefore, students felt that teaching/research experiences were somewhat

effective to effective in developing all 9 leadership skills. The fact that none of the ratings for

particular skills were quite as high as ratings for skills, in general, is probably related to

students having many different types of experiences which enhanced different sets of skills.

In other words, all experiences developed leadership skills, but each developed a different

combination of them.

With regard to the lowest-ranking leadership categories, most students felt that they already

had integrity and work ethic and so any effects of research/teaching on developing them were

minimal. The next two lowest categories, speaking and writing, were pulled down by the

ratings of students with research experiences. Survey results are consistent in that activities

that research students said they were less involved in (grant writing and presenting research) match the skills that they said were less developed by their experience (speaking and writing). This suggests to us that research experiences could be improved by putting more emphasis on speaking and writing, which equates to fostering environments where students will present their work.

191 Effects on academics and broader education

It seems that immediate effects of teaching and research experiences on undergraduate

academics were minimal (Fig. 3). Most students found that experiences were either not

effective or only somewhat effective at causing them to take different courses (2.5),

motivating them to take more difficult courses (2.8), or motivating them to increase their

GPA (2.9).

Nevertheless, student comments suggest that their experiences had a large impact on their

educations, in a broader sense. For example, one student wrote, “My research experience on

campus has really made my education MUCH more well-rounded. I understand the things we are taught in class because I have done them. And what I learn in class supplements my understanding of techniques.” Most students also reported that their experiences were effective (4.0) at helping them learn more about botany. Taken together, these data suggest that teaching/research experiences were highly educational even though they had little effect on undergraduate perceptions of academics.

Although teaching and research did not often cause students to change their undergraduate courses or improve grades, their experiences were effective (4.0) at causing them to consider further studies such as graduate school (Fig. 3). This is ironic since academic achievement is necessary to get into graduate school. The trend of impacting future academic plans while having little impact on current academics may be related to most students having their teaching/research experiences as upperclassmen, and also to their GPAs already being high

(average 3.5). It is unclear how teaching/research experiences might affect the academic

192 performance of underclassmen and/or a more random sample of the student population, where academics may have more room for improvement.

Effects on career goals

Students rated teaching/research experiences as somewhat effective (3.0) at “changing” their career goals (Fig. 3), usually because they had already established goals. Student comments frequently referred to experiences “reinforcing”, “refining”, and “encouraging” with regard to their future careers, suggesting that their goals were being positively affected although not changed.

Also, it seems that student attitudes towards pursuing a career were significantly affected

(Fig. 3). Students found that experiences were effective at helping them to develop more initiative towards pursuing a career (4.3) and at helping them to be more flexible in their outlook on career possibilities (4.1). Interestingly, the more general effects on initiative were rated more highly than effects of motivating students specifically toward a career in teaching

(3.8) or research (3.9).

Teaching/research experiences are also potentially valuable for showing students what they will not be happy with as a career. This was an outcome for two students, one who would prefer to avoid research and another who is less likely to get a job in industry. Nevertheless, students on average found that their experiences were not effective at “helping them to determine that they did not” want to teach (2.0) or do research (2.1). In fact, these were the lowest-ranking categories on the effectiveness scale (Fig. 3). Although discovering what one

193 does not like is a valid educational outcome, we view these scores as a further indication that

teaching and research experiences are having a positive influence on students.

Summary

For college/university departments with teaching, research, and/or internship requirements,

assessment can be very useful for improving experiences and better advising students. In the

Botany Department at N.C. State University, we found that those doing research are involved mainly in data collection and analysis, whereas those who are teaching are mainly involved with hands-on laboratory instruction. Nearly all students rated their experiences as very good overall and would recommend them to other students. Several positive educational outcomes were rated especially high, including a greater appreciation for teaching/research, greater initiative towards pursuing a career, an increase in skills, and greater consideration of graduate school. Students also found that the experiences were effective at building a range of “leadership skills”, but rated academic-related outcomes relatively low. Our results have been used to determine what students did during research/teaching experiences, the educational outcomes, and the overall success of the requirement. This study has given us knowledge of how to improve particular experiences and better advise students on which experiences to pursue in the future. Because every department (and every student) is different, we anticipate that much of the value of this study lies in the actual survey instrument and strategy for analysis. Therefore, we invite others to adapt this assessment strategy in their own departments.

194 Acknowledgements

We thank Drs. Gary Moore and Jim Flowers (Dept. of Agricultural and Extension Education at N.C. State) for their valuable feedback on an early draft of the survey, Dr. Arnold Oltmans

(Dept. of Agricultural and Resource Economics at N.C. State) for commenting on the manuscript, Sophia Clotho for her advice, and Botany undergraduates for completing surveys.

195 Literature cited

Boyer Commission on Educating Undergraduates in a Research University. 1998. Reinventing undergraduate education: A blueprint for America’s research universities. . Accessed 18 Feb 2004.

Chaplin, S.B., J.M. Manske, and J.L. Cruise. 1998. Introducing freshmen to investigative research – A course for biology majors at Minnesota’s University of St. Thomas. Jour. Coll. Sci. Teach. 27: 347-350.

Coker, J.S. and E. Davies. 2002. Involvement of plant biologists in undergraduate and high school student research. Jour. Nat. Resour. Life Sci. Educ. 31: 44-47.

Druger, M. 1997. Preparing the next generation of college science teachers. J. Coll. Sci. Teach. 26: 424- 427.

Fingerson, L. and A.B. Culley. 2001. Collaborators in teaching and learning: Undergraduate teaching assistants in the classroom. Teaching Sociology 29: 299-315.

Henderson, L. and C. Buising. 2000. A research-based molecular biology laboratory. Jour. Coll. Sci. Teach. 30: 322-327.

Howard Hughes Medical Institute. 2002. Undergraduate science education at research universities. . Accessed 18 Feb 2004.

Landrum, E.R. and L.R. Nelsen. 2002. The undergraduate research assistantship: an analysis of the benefits. Teaching of Psychology 29: 15-19.

Mabrouk, P.A. and K. Peters. 2000. Student perspectives on undergraduate research experiences in chemistry and biology. CUR Quarterly, Sept.: 25-33.

McLean, R.J.C. 1999. Original research projects – A major component of an undergraduate microbiology course. Jour. Coll. Sci. Teach. 29: 38-40.

Miller, J.E., J.E. Groccia, and M.S. Miller (Eds.). 2001. Student-assisted teaching: A guide to faculty- student teamwork. Anker Publ. Co.: Bolton, MA.

National Science Foundation. 1996. Shaping the future: New expectations for undergraduate education in science, mathematics, engineering, and technology. . Accessed 18 Feb 2004.

National Science Foundation. 2003. Grant proposal guide (NSF 03-041). . Accessed 18 Feb 2004.

Research Corporation. 2001. Academic Excellence: The Sourcebook. . Accessed 18 Feb 2004.

Service, R.F. 2002. New lure for young talent: extreme research. Science 297: 1633-1634.

Socha, T.J. 1998. Developing an undergraduate teaching assistant program in communication: Values, curriculum, and preliminary assessment. Jour. Assoc. for Communication Admin. 27: 77-83.

Solomon, S.S., S.C. Tom, J. Pichert, D. Wasserman, and A.C. Powers. 2003. Impact of medical student research in the development of physician-scientists. Jour. Investig. Med. 51: 149-156.

196 Sundberg, M.D. and J.E. Armstrong. 1993. The status of laboratory instruction for introductory biology in the U.S. universities. Amer. Biol. Teacher 55: 144-146.

Sundberg, M.D., J.E. Armstrong, M.L. Dini, and E.W. Wischusen. 2000. Some practical tips for instituting investigative biology laboratories. J. College Sci. Teach. 29: 353-359.

197

Had training in teaching techniques Wrote objectives

Helped develop a lecture or lab Set up a lab y t

i Gave brief presentations in lab v i Gave full-length lecture(s)

g act Wrote exams n Graded exams Presented course material on the internet eachi

T Routine tasks

Collected teaching research data Analyzed teaching research data

1234 Level of involvement

Figure 1. Average levels of student involvement in typical teaching-related activities, based on the following scale: 1 – not involved, 2 – somewhat involved, 3 – involved, 4 – very involved. Error bars represent standard error.

Made observations that led to a hypothesis Formulated hypothesis based on observations Designed experiments Wrote grant proposal Performed experiments y

t Collected data i v i Analyzed data

act Interpreted results of experiment

ch Made conclusions about results Presented research orally

esear Presented research as a poster R Submitted a manuscript for publication Presented research on the internet

1234 Level of involvement

Figure 2. Average levels of student involvement in typical research-related activities, based on the following scale: 1 – not involved, 2 – somewhat involved, 3 – involved, 4 – very involved. Error bars represent standard error.

198

Given me a new appreciation for teaching Given me a new appreciation for research Shown me the need for teamwork skills Helped me develop more initiative towards pursuing a career Increased my skills Shown me that I have a good work ethic Shown me the need for developing leadership skills Has caused to consider graduate school Helped me to learn more about botany Helped me develop teamwork skills Helped me to determine that I would like a research career Helped me develop leadership skills Helped me to be more flexible in my outlook on getting along with others … Helped me to be more disciplined with my time Motivated me towards a career in research Has enhanced my problem-solving skills ence has i Helped me to be more flexible in my outlook on career possibilities

per Has enhanced my analytical skills ex

Helped me to see that I am disciplined with my time

her Motivated me towards a career in teaching t

o Helped me see that I am disciplined in being on time g/

n Enhanced my speaking skills Helped me to determine that I would like a teaching career achi e

t Has caused me to consider continuing in the same company

ch/ Changed my career goals

ear Motivated me to increase my GPA s

e Enhanced my writing skills r

y Helped me to see that I need to develop integrity M Shown me that I need to develop a better work ethic Motivated me to take more difficult courses Caused me to make course changes Helped me to see that I do NOT want to do research Helped me to determine I do NOT want to teach

12345 Effectiveness scale

Figure 3. Student perceptions of their research and/or teaching experience, based on the following effectiveness scale: 1 – not applicable, 2 – not effective, 3 – somewhat effective, 4 – effective, 5 – very effective. Error bars represent standard error.

199