Insights Into Papaya Genome Organization Based on Bac End Sequence Analysis
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF H,iI,'N,'·i'I U!'W/\F-N INSIGHTS INTO PAPAYA GENOME ORGANIZATION BASED ON BAC END SEQUENCE ANALYSIS A THESIS SUBMITIED TO THE GRADUATE DMSION OF THE UNIVERSITY OF HAWAl'I IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN MOLECULAR BIOSCIENCES AND BIOENGINEERING AUGUST 2006 BY CHUN WAN JEFFREY LAl Thesis Committee: Gemot Presting, Chairperson Paul Moore Qingyi Yu Richard Manshardt We certify that we have read this thesis and that, in our opinion, it is satisfactory in scope and quality as a thesis for the degree of Master of Science in Molecular Biosciences and Bioengineering. ii MISSING PAGE NO. • • .I ( I ' A T THE TIlVIE OF MICROFILMING ACKNOWLEDGEMENTS I am obliged that Dr. Maqsudul Alam and Dr. Shaobin Hou from the Center for Genomics, Proteomics and Bioinformatics Research Initiative (CGPBRI) at the University of Hawaii generously provided 11,013 papaya BAC end sequence chromatogram files for this analysis. Dr. Ray Ming, who is a co-principle investigator of this project, and Dr. Qingyi Yu from the Hawaii Agriculture Research Center (HARC) directed the sequencing of most BAC ends analyzed here. Peizhu Guan from HARC and Kanako L. T. Lewis from CGPBRI contributed significantly by generating papaya BAC end sequences. I would like to specially thank my thesis committee members who contributed substantially: Dr. Gernot Presting (Chairperson), Dr. Paul Moore, Dr. Qingyi Yu and Dr. Richard Manshardt. Dr. Gernot Presting has supervised the entire project, recruited me into his lab during my second semester, and trained me in bioinformatics research and critical thinking. Dr. Paul Moore provided invaluable advice and comments for this analysis. I am honored that Dr. Richard Manshardt, who has many years of papaya genetic research, is on the thesis committee. I am indebted to Dr. Qingyi Yu who has directed the generation of most BAC end sequences used in this project. I am indeed thankful that Dr. DuIaI Borthakur offered me a chance to pursue a Master of Science degree in the Department ofMolecuiar Biosciences and Bioengineering. I desire to thank all faculty and staff who have enlightened and assisted me during my time in the University ofHawai'i. Anupma, Aren. Beth, Kevin, Moriah, and Thomas offered me friendship and precious insights in the lab. Finally, my mother Hay Hing, my sisters Kit Chi. Kit Hing, my wife Chui Ying and my son Jit Ching have supported my education in every possible way. iv ABSTRACf Papaya is a major tree-fruit plant grown mainly in tropical and subtropical regions. A BAC library was end sequenced and tbe resnlting 50,661 BAC-end sequences were analyzed bioinformatically. A total of 7,456 SSRs were identified among 5,452 BESs. Sixteen percent of BESs contain plant repeat homologies. BESs lacking plant repeats revealed 6,769 (19.1%) Arabidopsis cDNA homologies. BESs witbout plant repeat and Arabidopsis cDNA homology contained 1,124 (3.2%) RefSeq and 644 (1.8%) non-redundant protein sequence homologies. Low-copy papaya BES pairs (9,038) were compared to Arabidopsis, poplar, and rice genome sequences. A total of 53 BES pairs were mapped to Arabidopsis, 167 to poplar and 11 to rice. Low rate of co-mapping papaya BES pairs to Arabidopsis confirms tbe recent genome rearrangement in Arabidopsis. Poplar exhibited highest level of co linearity witb papaya and can be a reference genome for papaya genomic studies. v TABLE OF CONTENTS J\~O~~s. ....................................................... i" .ABSTRJ\cr ......................................................................... v UST OF T J\BLES ........................................................... .... ... 1fii LIST OF FIGURES ................................................................. "iii LIST OF J\BBREVlJ\TIONS. ....................................•.......... ..•... ix INTRODUCTION ........................................................... .•..• ... 1 METHODS........................................................................ .... 10 RES.UL TS...... ...... ...... .. .. .. ..... ......... ...... .. ... ...... .. ...•... ... .. 19 DiSCUSSiONS................. .................... ...........................•.•... 36 CONCLUSiONS................................................................. .... 50 J\PPENDIX J\: Papaya BJ\C end sequence library website (PBESL) ........ 52 J\PPENDIX B: J\lignments oftbree peach BJ\C clones to poplar ilenome ......•.•......•.•......•..........•........•......•...•..........•.•.•....••...• ~ J\PPENDIX C: Detailed coordinates of co-rnapped papaya BES pairs in heteroloilous genome comparisons......................................... .•... 66 J\PPENDIX D: Lists of compiled plant repeat databases for repeat content analysis of papaya BESs •.•.•.•..•..................•.•.•....•.......•.. ... 68 LlTERJ\TURE CITED ............................................................ ... 69 LIST OF TABLES I Summary of simple sequence repeats percentage of35,472 high quality BESs ........................•............................................•.. ..••.•• 22 2 BESs homologous to protein sequences from other plants but not found in Arabidopsis .................................... ...... .......... ... .... ..... .... 28 3 Summary of the annotation of top ten most abundant repeats and the number of matches in uncharacterized BESs •.•...•. ••... .•. ..•... .... .. ... 30 4 Statistics of number of BES pairs in different stages of genome mapping..... .... ..............................................•......•.•.....•.•. 32 5 Structure of table pBES ....................................................................................... 54 6 Structure of table SSR ....................................................................................... 55 7 Structure of table eDNA, REPx, REFSEQ, NR 56 8 structure of table SEQLEN 57 9 structure of table GMAP 58 10 Detailed coordinates of co-mapped papaya BES pairs in heterologous genome comparisons ........................................................... 66 11 Lists of compiled plant repeat databases for repeat content analysis of papaya BESs •.. .•.... .......•.•... .•...•... .•. ...... .... .. .•....... .•. ...•.•... 68 vii LIST OF FIGURES Figures ~ 1 Papaya BAC-end sequencing procedure 11 2 Comparative genome mapping procedure 16 3 Taxonomy tree of Carlcaceae and other speices that were used in comparative mapping of BES pairs ..•... .......... ...... .•. ...... .. ... ... 18 4 Selection of sequence length cutoff in high quality papaya BES .................................................................. 0 •••••••••••••••••• 20 5 Characterization of all simple sequence repeats (SSRs) detected in 35,472 high quality BESs........ .......... ................................ 24 6 Summary of plant repeat element content in 35,472 high quality BESs........ ••..... ........ ....................... ..............•.•......•........ 26 7 Comparative genome mapping ofBES pairs in heterologous plant genomes ......... ................................ ................................ 33 8 Project website index page 59 9 Microsatellite search engine •••••••••••••••••••••••••••••••••••••••••••• ••••••••••••• .. ••••••• ••••••••••••• 0 ••••• 61 10 BAC-end sequence search engine 63 11 Comparisons oftbree peach BAC clones to the poplar genome 64 viii LIST OF ABBREVIATIONS BAC - Bacterial artificial chromosome BES - BAC end sequence Mbp - Megabase pairs SSR - Simple sequence repeat bp -Base pairs k -Thousand kb -Kilobase nt - Nucleotide ix INTRODUCTION Production Carica papaya L., commonly known as papaya, is grown primarily in tropical and subtropical regions of the world. Brazil, Mexico, Nigeria, India, Indonesia, Thailand, Belize, Guatemala, Dominican Republic, Jamaica, Puerto Rico, the Philippines, and Hawaii, the major papaya-fruit production state in the United States, are the major papaya-fruits producing regions (perez and Pollack, 2006). In 2005, the total harvested area of Hawaii papayas was 1,480 acre, and the totallltili7J!tion production of papaya in Hawaii was estimated at 32.9 million pounds valued at $10.97 million dollars (Agricultural Statistics Board, 2006; Perez and Pollack, 2006). Papaya fruit production is a major source of income for Hawaii's agriculture industry. Most papayas are grown in the Puna district of the Island of Hawaii, and 91 percent of the harvested acres in the State of Hawaii are located in the Island of Hawaii (perez and Pollack, 2006). The genetically modified strain "Rainbow" accounts for over half of total acreage for current Hawaii papaya production. Varieties & Strains In Hawaii, the major papaya cultivar is 'Solo' and its derivatives (Morton J, 1987). 'Solo' has a high sucrose content and is usually eaten without cooking to retain its original sweetness. The 'Solo' papaya fruit varieties are usually pear-shaped and relatively small in size (-500 g) when compared to other commercial papayas. Representatives of 'Solo' derivatives include the 'Kapoho', 'Sunrise', 'Sunset', 'Waimanalo' and 'SunUp'. 'Kapoho' was the major commercial variety of 'Solo' before 1 the papaya ringspot virus became prevalent, occuping approximately one-third of the total acreage. Due to the susceptibility of 'Kapoho' variety to the papaya ringspot virus, the transgenic 'Rainbow' papaya has become the preferred variety for the papaya industry. 'Sunrise' and 'Sunset' papaya fruits have red-orange flesh and are rich in sucrose content. The 'Waimanalo' or 'Solo' line 77 produces fruits that are excellent in firmness and quality.