New Jersey Institute of Technology Digital Commons @ NJIT

Dissertations Electronic Theses and Dissertations

Spring 5-31-2007

Prediction of mRNA sites in the and Mathematical modeling of alternative polyadenylation

Yiming Cheng New Jersey Institute of Technology

Follow this and additional works at: https://digitalcommons.njit.edu/dissertations

Part of the Mathematics Commons

Recommended Citation Cheng, Yiming, "Prediction of mRNA polyadenylation sites in the human genome and Mathematical modeling of alternative polyadenylation" (2007). Dissertations. 813. https://digitalcommons.njit.edu/dissertations/813

This Dissertation is brought to you for free and open access by the Electronic Theses and Dissertations at Digital Commons @ NJIT. It has been accepted for inclusion in Dissertations by an authorized administrator of Digital Commons @ NJIT. For more information, please contact [email protected].

Copyright Warning & Restrictions

The copyright law of the United States (Title 17, United States Code) governs the making of photocopies or other reproductions of copyrighted material.

Under certain conditions specified in the law, libraries and archives are authorized to furnish a photocopy or other reproduction. One of these specified conditions is that the photocopy or reproduction is not to be “used for any purpose other than private study, scholarship, or research.” If a, user makes a request for, or later uses, a photocopy or reproduction for purposes in excess of “fair use” that user may be liable for copyright infringement,

This institution reserves the right to refuse to accept a copying order if, in its judgment, fulfillment of the order would involve violation of copyright law.

Please Note: The author retains the copyright while the New Jersey Institute of Technology reserves the right to distribute this thesis or dissertation

Printing note: If you do not wish to print this page, then select “Pages from: first page # to: last page #” on the print dialog screen

The Van Houten library has removed some of the personal information and all signatures from the approval page and biographical sketches of theses and dissertations in order to protect the identity of NJIT graduates and faculty. ABSTRACT

PREDICTION OF MRNA POLYADENYLATION SITES IN THE HUMAN GENOME AND MATHEMATICAL MODELING OF ALTERNATIVE POLYADENYLATION

y Yimig Ceg

Messege A (mA oyaeyaio ays may imoa oes i e ce suc as

asciio emiaio mA saiiy a asoaio a mA asaio i eukayoic ces Α age ume o uma a mouse gees ae muie

oyaeyaio sies (eee o as oy(Α sies a ea o aiae ascis some o wic ae asae io aious oei oucs wi iee ucios owee

e eais aou we a wee e oyaeyaio occus a ow e-mA swices om oe oy(A sie o aoe ae si ukow is ki o 3 -e

ocessig ca e eguae y e ce eiome ce cyce sage a issue ye

I is geeay accee a e ceaage o e-mA is ase o e sequece o uceoies aou e oy(Α sies So i is ossie o eic e oy(Α sies accuaey ase o e e-mA sequece o accomis e sueise eicio o a oy(Α sie a se o saisica moes as ee use suc as iea iscimia aaysis quaaic iscimia aaysis a suo eco macie (SM Amog

ese SM was cose as e cassiicaio agoim o e eicio o oy(A sies i is wok Α ogam cae oya sm as ee eeoe usig E e ue

osiie a accuacy esus oaie usig is meo ae ee a e esus oaie usig oe commoy use agoims Comae wi e micoaay ecique seia aaysis o gee eessio

(SAGE is aoe oweu ecoogy o measuig e mA eessio ees Ou suy is e is iesigaio o e eguaio o e ascis om e same gee y aayig e SAGE aa y ieig e oise aa om e aaase a cacuaig

e coeaio ewee ascis om e same uigee cuse some sigiica gees ae ou o ae muie ascis wi oosie eessio ees ese gees mig

e ey ieesig o ioogiss a ey ae wo eig eiie y ioogica eeimes

Aeaie oyaeyaio as ee ou o e ey commo i uma a mouse gees ecey I as ee eiee a e seecio o iee oy(A sies is

eae o ioogica acos suc as e eeomea sages ce coiios a e aaiaiiy a auace o some oei acos owee i is o cea ow ese

acos aec aeaie oyaeyaio Maemaica moeig is aie o uesa e yamica seecio o oy(Α sies Ceaage simuaio aco (Cs is a ey imoa oei come equie o eicie ceaage coaiig suuis o 77 a 5 k (Cs-77 Cs- Cs-5 I as ee ou a uma cstf-77 gee as seea iee ascis ue o e aeaie oyaeyaio a e eessio ees o ese ascis isay some auo-eguaio A maemaica moe wi a ime eay is cosuce o simuae e yamica gee eessio ees o gee cstf-77. Eeimea aa ae comae wi e moe is ki o maemaica moe ca aso e eee o some oe oyaeyaio acos a ae simia aeaie oyaeyaio aes EICIO O MA OYAEYAIO SIES I E UMA GEOME A MAEMAICA MOEIG O AEAIE OYAEYAIO

y Mimig Ceg

A isseaio Sumie o e acuy o ew esey Isiue o ecoogy a uges e Sae Uiesiy o ew esey - ewak i aia uime o e equiemes o e egee o oco o iosoy i Maemaica Scieces

eame o Maemaica Scieces I eame o Maemaics a Comue Sciece uges-ewak

May 7 Coyig © 7 y Mimig Ceg

A IGS ESEE APPROVAL PAGE

PREDICTION OF MRNA POLYADENYLATION SITES IN THE HUMAN GENOME AND MATHEMATICAL MODELING OF ALTERNATIVE POLYADENYLATION

Ceg Yimig

Dr. Robert M. Miura, Dissertation Advisor Date Professor of Mathematical Sciences, NJIT

Dr. Bin Tian, Dissertation Co-Advisor Date Assistant Professor of Biochemistry and Molecular Biology, UMDNJ

Dr. Bruce Byrne, Committee Member Date Adjunct Professor of Medicine and Molecular Genetics & Microbiology, and Associate Director for Education, the Information Institute, UMDNJ

Sui Κ a Commiee Meme Date Associae oesso o Maemaica Scieces ΙΤ

Dr. Jorge P. Golowasch, Committee Member Date Associate Professor of Mathematical Sciences, NJIT and Biological Sciences, Rutgers-Newark IOGAICA SKEC

Auo: Miming Cheng

egee: Doctor of Philosophy

ae: May 2007

Uegauae a Gauae Eucaio:

• Doctor of Philosophy in Mathematical Sciences, New Jersey Institute of Technology, Newark, NJ, 2007

• Master of Science in Computational Mathematics, Fudan University, Shanghai, . . China, 2002

• Bachelor of Science in Pure Mathematics, Fudan University, Shanghai, P. R. China, 1999

Major: Mathematical Sciences

eseaios a uicaios:

Y. Cheng, R. M. Miura, B. Tian, "Highly accurate prediction of mRNA polyadenylation sites using a support vector machine", Bioinformatics 22(19):2320-5, 2006.

Y. Cheng, Z. Kang, W. Wang, "Minimal Residual Polynomial Method on the Asymmetric System". Journal of Fudan University (Natural Science), Vol. 42:160-4, 2003.

Y. Cheng, R. M. Miura, B. Tian, "Prediction of Polyadenylation Sites Using Support Vector Machines", SIAM Conference on the Life Science, Raleigh, NC, July 31-Aug. 4, 2006.

Y. Cheng, R. M. Miura, B. Tian, "Prediction of Polyadenylation Sites Using Support Vector Machines", SIAM Annual Meeting Boston, Boston, MA, July 10-14, 2006 (Invited talk by SIAM Chapters).

iv C. Cheng, Y. Cheng, P. Dubovski etc., "Backward diffusion methods for digital halftoning", Mathematical Problem (IBM) in Industry Workshop, Olin College, Needham, MA, June 12-16, 2006.

Aweei A Cakaoy Y Ceg Μ aki Kie ee "Extracellular Matrix Accumulation & Nutrient Diffusion in Hydrogel", the 3rd Mathematical Modeling Camp, RPI, Troy, NY, June 6-9, 2006.

Y. Cheng, R. M. Miura, B. Tian, "Prediction of Polyadenylation Sites Using Support Vector Machines", Frontiers in Applied and Computational Mathematics, NJIT, Newark, NJ, May 15, 2006.

Y. Cheng, R. M. Miura, B. Tian, "Prediction of Polyadenylation Sites Using Support Vector Machines", NJIT the 2nd Annual Provost's Student Research Showcase, Newark, NJ, Apr. 12, 2006.

Y. Cheng, R. M. Miura, B. Tian, "Prediction of Polyadenylation Sites Using Support Vector Machines", NJIT Graduate Student Research Day, Newark, NJ, Mar. 6, 2006.

Y. Cheng, R. Jin, C. Wang, J. Wu, P. Yao, "Woe Geome Miig asciioa acos aes o Eα i eas Cancer using ChIP-Chip", MBI program at Ohio State University, Columbus, OH, Aug. 1-19, 2005.

Y. Cheng, R. M. Miura, "Analysis of Equivalent Distorted Ratchet Potentials", Einstein in the City: A Student Research Conference at The City College of New York, New York, NY, Apr. 11-12, 2005.

Y. Cheng, R. M. Miura, "Analysis of Equivalent Distorted Ratchet Potentials", 2005 Eastern Sectional Meeting of the American Mathematical Society, Newark, DE, Apr. 2-3, 2005 Το e Memoy o iau Ceg

i ACKOWEGME

I wou ike o eess my ee a sicee gaiue o my aiso oe M Miua

o is isigu guiace u suo a coiuous ecouageme is way o aisig is ey eecie o me e guie me o e ig iecio wiou coiig my ougs a ieas Miua as ee aways aaiae we I a quesios o coces aou my eseac my Egis o ee my uue caee a o a I ee geay aku I am ey ouae o ae is kiy a kowegeae aiso wo

ee me oecome may osaces a guie me io may ew eseac aeas

I wou aso ike o ak my co-aiso i ia om Uiesiy o

Meicie a eisy o ew esey Wiou is sueisio I wou ee kow wa is ioiomaics wa aes o e oyaeyaio ocess isie e ce; wiou is e I wou ee ge e oouiy o ay maemaics o e seciic

ioogica oems is aice o my wiig a ocumeaio is ey imoa o my uue Secia aks o uce ye Sui K a a oge

Goowasc o aciey seig as my isseaio commiee ei suggesios a commes ae ey eu May aks o Amiaa K ose emeios

aageogiou Yua Youg Siog iag a Syese omso om

ao Uiesiy o a kis o e owas o e comeio o e esis

e caieso o eame o maemaica scieces ai S Auwaia is ake o e iacia suo a coiuous ecouageme May aks o e

ome a cue oice sa Susa Suo ama Guai Aa eeso Sei M

ow iiaa oa a Eee Ma ei e wi e o-acaemic issues mae my ie easie a I I wou ike o eess my sicee gaiue o my oommaes

vii ao Wag ua e a i Ceg Wi ei accomayig I ae a ey memoae a eaceu ee-yea aiso ie I wou aso ak ii Wag Sieg Gu Kua

u ua Gog Yu ag Cuseg Yag eua a a Micae sai Ae meeig em ie ecomes moe iey a eeaiig

I am ey iee o my wie aoia i o aways eig ee o me a giig me gea suo a ecouageme I o ee kow ow I wou ge e

esis oe wiou e cosa oe a eie i me I wou aso ike o ak my

ies a oso Guaog ag uua Cai a i Wag Wiou em I ca image wa ki o weeke ie I wou ae My eae aks go o my aes a sises o ei kiess oe aiece uesaig a suo oe e yeas

iii TABLE OF CONTENTS

Chapter Page

1 IOUCIO 1

11 ackgou Iomaio 1

1 Summay o esus

IOOGICA ACKGOU 5

1 uma Geome 5

Gee Eessio

3 mA oyaeyaio a Cis-eeme 9

3 EICIO O OYAEYAIO SIES USIG SUO ECO MACIES 11

31 iee Cis-eemes 1

3 aases a osiio Seciiciy Scoe Mai 1

33 Coeaio ewee iee Cis-eemes 15

3 aiig esig a eicio 17

31 Comaiso o iee Meos 1

3 eae-Oe-Ou Meo 19

33 oya_sm 1

35 eicio oy(Α Sies i e uma Geome 5

3.6 Further Improvements of Polya_svm 28

SAGE AA AAYSIS 3

1 Ioucio o SAGE 3

i

AE O COES (Coiue

Cae age

4.2 Analysis Pipeline .. 34

4.2.1 Reliable Libraries Selection 34

4.2.2 Is SAGE Data Tissue-Specific? 36

4.2.3 The Reliable Tags ..... 38

4.2.4 Significant Unigenes 39

4.3 Results and Conclusions .. 41

5 ALTERNATIVE POLYADENYLATION .. 43

5.1 Introduction to Alternative Polyadenylation 43

5.2 Factors Involved in Polyadenylation .. 45

5.3 Proposed Mechanisms for Alternative Polyadenylation 48

531 ye II Gee oy(Α Sie Swicig 48

53 ye III Gee oy(Α Sie Swicig ewee 3U a 3 5

5.4 Proposed Model for Alternative Polyadenylation of Type III 54

6 MATHEMATICAL MODELING OF CSTF-77 ALTERNATIVE POLYADENYLATION 60

6.1 Introduction to CstF-77 60

6.2 Proposed Model of Alternative Polyadenylation for CstF-77 ... 62

6.3 Mathematical Description 65

6.4 Theoretical Analysis of Differential-Delay Equation .. 67

6.5 Numerical Simulations 78

6.6 Parameter Estimation ... 79 AE O COES (Coiue

Cae age

7 esus 3

71 Eeimea aa 3

7 Oima aamee Se 3

73 iee Iiia ucios 5

7 asi o Aacio a asi ouay 9

Cocusios a iscussio 99

7 SUMMAY A UUE WOK 1

AEI A SAISICA AGOIMS A SIGIICACE 1

AEI OGAMMIG AGUAGES 11

AEI C SAGE MAEIAS 113

EEECES 11

xi

LIST OF TABLES

Table Page

3.1 Comparison of LDA, QDA, and SVM 19

3.2 Effects of Leaving Out Some Cis-elements .... 20

3.3 Comparison of Polya svm with Polyadq for Different Types of osiie oy(Α Sies 26

3.4 Comparison of Polya_svm with Polyadq for Different Types of egaie oy(Α Sies 28

3.5 Comparison of Different Methods for Replacing Infinite Values .. 29

4.1 SAGE Data Example .... 35

4.2 Nine Significant Unigenes with Negatively Regulated Transcripts ... 41

A.1 First Order Markov Chain Transition Matrix .. 106

C.1 Sixty-one Significant Unigenes with Negatively Regulated Transcripts 114

xii

LIST OF FIGURES

Figure Page

2.1 Central dogma of biology .... 7

2.2 Central dogma of biology: feedback network 7

2.3 RNA synthesis starting at the promoter and ending at the terminator 8

2.4 Diagram of expression 9

3.1 Fifteen cis-elements 13

3.2 One example sequence from polyA_DB 14

3.3 The PSSM of AUEl ... 15

3.4 Cuseig o 15 cis-eemes usig oy(Α sies om oyA_ 16

3.5 Heatmap plot of predicted probability results of 100 test sequences 22

3.6 Scemaic o 15 cis-eemes i e oy(Α egio a e seac algorithm for polya_svm .. 23

3.7 Prediction of positive and negative sequences 24

3.8 HPR vs relative maximum accuracy ... 30

3.9 oo o oya sm scoes o iee yes o oy(Α sies .. 31

4.1 An outline of SAGE 32

4.2 The pipeline of SAGE data analysis...... 35

4.3 Correlation between the number of unique tags and the total number of tags .... 37

4.4 The levels for different tissues 37

4.5 T-value distribution of random tags 40

4.6 The transcripts structure of Unigene Hs.334885 41 IS O IGUES (Coiue

igue age

5.1 Scemaic eeseaio o oy(Α sies .. 44

5.2 Schematic representation of the steps involved in the mammalian pre-mRNA 3' process 46

5.3 Protein factors involved in polyadenylation ... 47

5.4 ye II gee oy(Α sie swicig 48

5.5 ye III gee oy(Α sie swicig 53

5.6 Competition between a SS and a PS .. 56

5.7 Scatter plot of SC5 and PAs vs DIS 58

5.8 Polyadenylation assembly time + RNAP traveling time vs splicesome assembly time 59

6.1 SAGE library data for CstF-77.L and CstF-77.S 61

6.2 The transcript structure of gene cstf-77 ... 61

6.3 Alternative polyadenylation of CstF-77 64

6.4 Parameter phase plane for the linear DDE 71

6.5 The phase plane of the long and short form transcripts .. 79

6.6 Relationship between the delay time and the period of Equation (6.3) .. 81

6.7 Experimental data at five time points .. 83

6.8 Solutions fitted with optimal parameter set .. 84

6.9 Distribution of parameters with error less than 0.5 85

6.10 Solutions of Equation (6.3) with long form CIF set to be 0.0652 ... 87

6.11 Solutions of Equation (6.3) with long form CIF set to be 0.0653 ... 87

i IS O IGUES (Coiue

igue age

1 Souios o Equaio (3 wi og om CI se o e 5

13 Souios o Equaio (3 wi so om CI se o e 1

1 Souios o Equaio (3 wi so om CI se o e 5 9

15 Souios o Equaio (3 wi iee CIs 9

1 asi ouay o CI 91

17 o o Equaio (17 93

1 e iucaio souios wi so om CI equa o 1 9

19 e iucaio souios wi so om CI equa o 1 9

e iucaio souios wi so om CI equa o 1 95

1 Souios wi iee og a so om CIs 9

Souios wi CI saig ea e ie oi 97

3 ase ae o e imi cyce souio 9

yamica eaio o Equaio (3 wi oima aamees 99

Α1 iscimia oem 1

Α A a QA 19

Α3 Suo eco macie agoim 19

C1 e ascis sucue o eig sigiica uigees 11 LIST OF ABBREVIATIONS

Abbreviations Full Descriptions

A Aeie

AE Auiiay owseam Eeme

AUE Auiiay Useam Eeme

C Cyosie

CC Coeaio Coeicie

CE Coe owseam Eeme

cA comemeay A

CS Coig Sequeces

C Ceaage aco

CI Cosa Iiia ucio

CS Ceaage a oyaeyaio Seciiciy aco

Cs Ceaage simuaio aco

Cs-77 Cs-77 og om asci

Cs-77S Cs-77 So om asci

C C-emius omai

CUE Coe Useam Eeme

E ieeia-eay Equaio

A eoyiouceic Aci

A owseam oyaeyaio Sie

SE owseam Eeme

xvi Aeiaios u esciios

ES Eesse Sequece ag

ase egaie

ase osiie

G Guaie

MM ie Mako Moe

ig oaiiy egio

I Iiia ucio

A iea iscimia Aaysis

MC(Μ Mako Cai (Moe mA messege iouceic Aci

uceoies

A oyaeyaio oymeise

As oyaeyaio Sie Scoe

AS oyaeyaio Siga

C oymeise Cai eacio

oy(Α sie oyaeyaio Sie

SSM osiio Seciic Scoe Mai

QA Quaaic iscimia Aaysis

A iouceic Aci

A A oymeise II

SAGE Seia Aaysis o Gee Eessio

SC5 5 Sicig Sie Scoe

xvii Aeiaios u esciios

SN Sesiiiy

S Seciiciy

SS Sicig Sie

SVM Suo eco Macie

Τ ymie

ΤΒΡ AA iig oei

ue osiie

TIS asciio Iiiaio Sie

ue egaie

ΤΡΜ ag e Miio

UPA Useam oyaeyaio Sie

USE Useam Eeme

U Uasae egio

iii CHAPTER 1

INTRODUCTION

1.1 Background Information

e amou o iomaio i e ie scieces is eaig amaicay Comue

ecoogies a maemaica moeig ae cagig e way we ook a iig ces

Sice iig igs ae ey come sysems i is ey iicu o uesa ese sysems usig aiioa ioogica isciies Ieisciiay eseac is iceasigy imoa o soe suc come sysems a ieisciiay eucaio

as ee oose (Sug e a 3 I is wok seea seciic ioogica

oems ae ee iesigae om aious iewois icuig moecua

ioogy ioiomaics a maemaica moeig ee ae a uge ume o come ocesses occuig i uma ces eey seco suc as asciio a

asaio uma ces coai miios o oeis u coai ess a 3 gees (Sei Seea imoa acos coiue o e comeiy o ow

iee oeis ae mae om e same gee Aeaie oyaeyaio

ocessig is oe o em a e eaie mecaism is si ukow

e messege A (mA oyaeyaio is a os-asciioa

ocess icuig wo igy coue ses (Coga a Maey 1997 a eouceoyic ceaage o e-mA oowe y e oymeiaio o a aeosie ai I is iesigaio ee oems associae wi mA

oyaeyaio i uma ces ae ee eoe Oe oem is e eicio o

1 2

e oyaeyaio sies (oy(A sies usig a suo eco macie (SM oe

o e es cassiicaio agoims A mao imoeme i e eicio accuacy

is aciee comae wi oyaq (aaska a ag 1999 a ey commoy

use oy(Α sie eicio agoim Aoe oem is o use e seia aaysis

o gee eessio (SAGE (ecuescu e a 1995 aa o iscoe gees a ae

moe a oe mA asci ue o aeaie oyaeyaio a ese

ascis ae sow some egaie eguaio i gee eessio is is a oe

suy a some o e esus may e ey useu o ioogiss a oie some

isig o aeaie oyaeyaio e i oem is e maemaica

moeig o aeaie oyaeyaio o gee cstf- 77, wose oei oucs ae

ee sow o e ey imoa i eguaig e oyaeyaio o oe gees A

maemaica moe wi a ime eay is oose o eaiig e mecaism o

aeaie oyaeyaio Some eimiay eeimea aa ae ee use o

aiae e moe a i e moe we

1.2 Summary of Results

eicios o oy(Α sies ae ee aeme y seea gous uig e as

seea yeas (Saamo a Sooye 1997; aaska a ag 1999; egee a

Gauee 3; a aais e a ey ae use iee meos icuig

iea iscimia aaysis quaaic iscimia aaysis a ie Mako

moe Seea ogam ackages ae ee eeoe amog wic oyaq

(aaska a ag 1999 is a commoy use oo o oy(Α sie eicio I e

ese wok a sa-aoe ogam cae oya_sm (Ceg e a as ee

eeoe o oy(Α sie eicio usig e 15 cis-eemes (u e a 5 a

SM o e oea eomace oya_sm is 337% moe sesiie a oyaq 3

(5% s 395% i sesiiiy (S Some imoemes i accuacy ae ee mae sice e eease o e is esio

Seia aaysis o gee eessio (SAGE (ecuescu e a 1995 is a

oweu ecoogy o eemie gee eessio y simuaeous measueme o

e equecies o a age ume o sequece ags eie om mA/ES

ascis Uike micoaay gee eessio aa SAGE aa ae useu o eec

e iee A ascis om e same gee ecey moe a moe gees

ae ee ou o geeae moe a oe mA asci a us ey may

ouce moe a oe oei ie is kow aou e eaie eessio ees

amog ese ascis o a e ascis ae oei-coig As a some

gees may make use o is o coo e oei ouc iiecy I a is e case

e eessio ees o e coig ascis a o-coig ascis o e same

gee sou e oosie o ey sou e egaiey eguae is may eea a ukow mecaism o aeaie oyaeyaio ee y usig SAGE aa we

iscoee a ume o gees wic ae wo o moe ascis ue o aeaie

oyaeyaio wi oosie eessio ees

ike aeaie iiiaio a aeaie sicig aeaie oyaeyaio

coiues o e comeiy o e oea oo o mA ascis i uma ces

owee i is si o cea ow e ce uiies iee oy(Α sies i esose o

iee ce coiios a eeomea sages Seea oei acos ae ee

ieiie as eig ecessay o i io ceaage a oyaeyaio I e gee o

e oyaeyaio aco as muie oy(Α sies oucig muie oei

oucs wi iee ucios o oyaeyaio e i cou coo e oea

oyaeyaio aciiy y usig aeaie oy(A sies I wou aso aec is

ow ouciiy a us om a sime ee-ack oo 4

e yamics o e eessio ees ewee e wo ascis cou e

ee uesoo y ayig maemaica moeig o suy e uiiaio o

iee oy(Α sies Ceaage a simuaio aco Cs is a ey imoa

oyaeyaio aco comose o ee suuis Cs-5 Cs- a Cs-77

ecey gee cstf- 77 as ee ou o coai a ioic oy(Α sie a us as moe a oe iee mA asci (a e a e use o a owseam

oy(Α sie geeaes a ucioa oei aco wic is kow o e oe suui o

Cs e oei geeae om e ioic oy(Α sie is si ukow a we

oe o iscoe e ucio y e simuaio I is iesigaio we ae

oose a oe maemaica moe cosisig o ieeia equaios wi a ime

eay o suy e yamica eaio o e gee eessio ees eoeica aaysis as ee oie aog wi some umeica simuaios o e equaios

Some eeimea aa ae eiie e umeica esus a ue iscussio as

ee gie o ou kowege is is e is ime a maemaica moeig as

ee aie o suy aeaie oyaeyaio a is may oie some eaaio o e osee ioogica aa CHAPTER 2

BIOLOGICAL BACKGROUND

e eomeo o eeiy is a cea eaue i e eiiio o ie A iig ces soe ei eeiay iomaio i e om o oue-sae

eoyiouceic aci (A o cay ou is iomaio-soage ucio A mus eess is iomaio o guie e syesis o oe moecues i e ce suc as As a oeis umas ae mui-ceua ogaisms mae u o iios o iiiua ces Wii eac uma oy eey ce coais eacy e same A ece we muaios occu i some ces owee eac ce eesses is A iomaio ieey i iee issues a is accous o e ieeces i gee eessio

2.1 Human Genome

e uma geome oec (G as ee egae as e mos imoa se i sciece i ece yeas Uo e comeio o e sequecig o e uma geome i 3 is wea o aa as ee e oec o may ew iesie scieiic

eseac aeas suc as geomics aa aaysis a maemaica moeig I e geome ca e ieee coecy we wi e ae a muc ee uesaig o uma eigs a e ioogica ocesses occuig isie us

A geome is a ogaisms comee se o A wic is mae om ou

uceoie ases Aeie (A ymie ( Guaie (G a Cyosie (C ese

ou uceoies om wo ase ais amey A aways ais wi a G aways

ais wi C e G was is aicuae i 19 y a secia commiee o e US

aioa Acaemy o Scieces a ae aoe y e aioa Isiues o ea

5 6

(ΙΗ a e eame o Eegy (OE I ook aou 15 yeas o accomis

uig wic ee ae some miesoes i maig e uma geome

• 1995 e aemoius iueae geome (1 M (eiscma e a 1995 is is e is comee geome o a ee-iig ogaism

• 199 e Saccaomyces ceeisiae geome (11 M (Goeau e a 199 is is e is comee eukayoic geome

• 199 e Caeoaiis eegas geome (97 M (199 is is e is geome o a mui-ceua secies

• e meaogase geome (1M (Myes e a is is e is geome sequece y e woe-geome sogu meo

• 1 e a o uma geome (3G (ee e a 1 (ae e a 1 Aou 9% o e uma geome as ee sequece

• 3 Comeio o e uma geome

e uma geome coais aou ee iio ase ais wic esie i 3

ais o comosomes wii e uceus o a uma ces Eac comosome coais ues o ousas o gees wic cay e iomaio o makig

oeis ee ae --5 gees i e uma geome (Sei a eac makes ee oeis o aeage

2.2 Gene Expression

e sequece iomaio o uma geome is aaogous o a maua o e uma

oy e e quesio emais as o ow is iomaio is asae io makig

e uma oy? ow oes a gee eess is sequece? e cea ogma o moecua ioogy was is euciae o aswe is quesio i 195 a esae i

197 y acis Cick (Cick 197 amey "e cea ogma o moecua

ioogy eas wi e eaie esiue-y-esiue ase o sequeia iomaio

I saes a suc iomaio cao e asee iecy om oei o eie protein or nucleic acid." The idea can be represented in Figure 2.1. The process from

DNA to RNA is called transcription and from RNA to protein is called translation.

Figure 2.1 Central dogma of biology. Genetic information is transferred from DNA to RNA, then to protein.

In reality, the processes occurring in a cell are much more complex. A lot of

factors are involved to control and regulate the transcription and translation. These

factors include the environmental conditions, , and some intermediate

products. Also, some processes are tightly coupled with each other. These could form

a complex feedback network (see Figure 2.2).

Figure 2.2 Central dogma of biology: feedback network. The proteins generated from RNA would affect the transcription, translation, and DNA replication. The microRNA would also affect transcription by RNA interference (RNAi). The interactions form a feedback network.

We e uiesa eessio o a gee is eamie i eai (Oaies a eieg e gee eessio away i eukayoic ces ca e wie as

e oowig sequeia ees

Iiiaio o asciio -i asciio sa caig a 5 e - sicig coue wi asciio ceaage o e-mA - oymeiaio o a aeie

ai maue mA omaio asoaio om uceus o cyoasm - iiiaio o asaio asaio sa oei syesis- oei oig

ucioa oei ocaiaio o oei o iee comame o aiciae i ce aciiy

mA syesis is caaye y A oymeise II (A asciio is iiiae we RNAP is o a secia egio e omoe a e sa o e gee

om is oi A moes aog e A emae syesiig A ui i

eaces a emiao sequece (ewi 3 Sequeces ae coeioay wie so

a asciio ocesses om e (useam o ig (owseam (see igue

3 is coesos o wiig e mA i e usua 5 3 iecio

igue 3 A syesis saig a e omoe a eig a e emiao e omoe is a so A sequece wi eg —35 wic sees as e iig egio o A oymeise II e emiao is e egio wee A oymeise II is eace om e A emae igue ake om (ewi 3 9

In fact, not all genes are coded into proteins. There are two types of genes: one of them is a non-coding RNA gene that represents about 2-5% of the total number of genes and encodes functional RNA molecules, such as tRNA and rRNA; the other is a protein-coding gene that represents the majority of the genes. For the protein-coding genes, not all the nucleotides of the gene are decoded into proteins. Only —5% of its sequence (exon) is decoded, and the remainder of its sequence (introns) is spliced out during the transcription process (see Figure 2.4). Before the formation of a mature transcript (mRNA) from a protein-coding gene, an oligonucleotide adenine is added after the cleavage of the last exon, the so-called mRNA polyadenylation.

Figure 2.4 Diagram of gene expression. Intron is the region spliced out during transcription, and exon is the region not spliced out during transcription. Capping, splicing, and polyadenylation are coupled with transcription.

2.3 MRNA Polyadenylation and Cis-elements

The mRNA polyadenylation is composed of two tightly coupled steps (Colgan and

Manley 1997): the first step is an endonucleolytic cleavage that takes place at the site determined by the surrounding RNA sequence and its binding proteins; and the seco se ioes e oymeiaio o a aeosie ai (eee o as oy(Α tail) with length ranging over 80-500 nucleotides (nt) at the 3' end of the cleaved

A e oy(Α ai is ou i eay a mAs i eukayoes (wi e exception of most histone genes), and it is involved in every aspect of mRNA 1 meaoism icuig mA saiiy mA asaio (acoso a e 199;

Sacs e a 1997 a mA eocaiaio (Wickes e a 1997 So oy(Α

ais (-5 ae aways coeae wi asaio eessio (e Moo e a 5

e kieics o oyaeyaio ca ae a iec imac o mA oucio o eace oyaeyaio a eesse oyaeyaio ae ee sow o cause

uma iseases icuig omoiia (Geig e a 1 a immuoyseguaio (ee e a 1 e eaie mecaism o

oyaeyaio wi e iscusse i a ae secio

Α cis-eeme is a uceoie sequece a as eguaoy ucios i some ceua ocesses a usuay sees as a iig sie o some oeis I is geeay accee a sigas ae equie o iggeig oyaeyaio ea e ceaage sie (Kee e a 1991 Α ume o auiiay useam eemes (USEs o

owseam eemes (SEs ae ee ieiie o iuece oyaeyaio i

ia a ceua sysems icuig S (Caswe a Awie 199 I (ow e a 1991; asamakis e a 199 a uma C comeme (Moeia e a 199

iee caiae cis-eguaoy eemes i umas ae ee ieiie (u e a

5 amog wic some ae cosee i owe secies suc as yeas a as a some ae seciic o umas I is eiee a ese 15 cis-eemes wou cooeae wi eac oe a e gie iomaio aou e oy(Α sie is eas

o e is a o e wok CHAPTER 3

PREDICTION OF POLYADENYLATION SITES USING SUPPORT VECTOR MACHINES

Messege A oyaeyaio is a os-asciioa ocess a as a oy(Α

ai o e ceae e-mA oyaeyaio is iecy ike o e emiaio o

asciio (uaowski 5 Maucio o oyaeyaio as ee imicae i seea uma iseases (ei e a 19; ee e a 1; Geig e a

1

Sigas aou a oy(Α sie ae equie o omoig oyaeyaio e geomic sequece suouig a oy(Α sie is eee o as a oy(Α egio e

uceoie comosiio o uma oy(Α egios is geeay U-ic (egee a

Gauee 3; ia e a 5 Α eame AAUAAA o a cose aia wic is

ocae 1-35 useam o mos uma oy(Α sies is usuay cae e

oyaeyaio siga (AS (ia e a 5 U/GU-ic sequeces ae ocae wii - owseam o e oy(Α sies (auaya e a 3; u e a 5

I aiio a ume o auiiay eemes ae ee ieiie i ia a ceua sysems (Caswe a Awie 199; ow e a 1991; asamakis e a 199;

Moeia e a 199; u e a 5 Yeas a a gees uiie a isic se o cis-eemes o oyaeyaio (Gae e a 1999; ao e a 1999 Wie

AAUAAA is a omie eame ocae useam o oy(Α sies i ese secies i occus o a muc esse ee a i mammaia sysems I aiio UAUA a

UGUA eemes ae e eiciecy eemes ocae 3-7 useam o yeas

oy(Α sies (Gae 3 wic aso ae ee ou o e ucioa eemes i

uma ces (ekaaama e a 5

11 2

eicio o oy(Α sies as ee aeme y seea gous uig e

as seea yeas A eay aoac y Saamo a Sooye (Saamo a

Sooye 1997 use iea iscimia ucio Α ume o aiaes wee use icuig osiio weig maices o e useam AAUAAA eeme a

owseam U/GU-ic eeme isace ewee AAUAAA a U/GU-ic eemes a eame a ie comosiios i o useam a owseam

egios aaska a ag (aaska a ag 1999 eeoe oyaq wic emoye wo quaaic iscimia ucios o sequeces coaiig AAUAAA a AUUAAA e ogam aso uses a osiio weig mai o e owseam sequece a weige aeage o i osiios o owseam eemes a

owseam ie eeeces I aiio weig-mai-oy (egee a

Gauee 3 a ie Mako moe (MM aoaces (Gae e a ;

a aais e a aso ae ee emoye o oy(Α sie eicio

. iee Ciseemes

iee cis-eemes (see igue 31 i oy(Α egio suouig uma oy(Α sies wee ieiie y usig a eame eicme meo cae OE (u e a

5 ese cis-eemes wee suggese o ay eacig oes i mA

oyaeyaio ecause

• ey ae occuig moe oe i uma oy(Α egios comae wi aom sequeces

• ey ae use moe equey o sog oy(Α sies a o weak oes 13

Figure 3.1 iee cis-eemes AUE auiiay useam eemes; CUE coe useam eemes; CE coe owseam eemes; AE auiiay owseam eemes egio i e is coum iicaes e ocaio o e cis-eemes i e oy(Α egio wi e oy(A sie se a a e owseam iecio se as osiie igue ake om (u e a 5

I e oy(A sie is se o a e owseam iecio is eee o as e

osiie iecio e eemes ae isiue as oowig ase o ei ocaios

ou auiiay useam eemes (AUEs i e -1 o -1 egio; wo coe useam eemes (CUEs i e - o -1 egio; ou coe owseam eemes

(CEs i e +1 o + egio; a ie auiiay owseam eemes (AEs i

e +1 o +1 egio (u e a 5

auay ese 15 cis-eemes ca e cosiee as 15 aiaes a use i macie-eaig oos o oy(Α sie eicio In aicua meos a ake io accou ieacios ewee e aiaes ae mos suiae o eicig oy(Α sies is is ecause cis-eemes ae ecogie y A-iig oeis uig mA oyaeyaio suc as CS-1 iig o AAUAAA (Kee e a 1991

Cs- iig o e U-ic a UG-ic eemes (ee Caaias a aai 1

aai 3 a amiy oeis iig o e G-ic eeme (Ai e a ese oeis ae ee eoe o ae eesie ieacios i e

oyaeyaio maciey (ouoo

3 aases a osiio Seciiciy Scoe Mai

A comeesie aaase o oyaeyaio sies cae oyA_ (ag e a

5; ee e a 7 as ee cosuce e aaase coais 93 uma geomic sequeces wi eg suouig e oy(Α sies (-3 o +3

wic coeso o 139 gees Oe eame sequece is sow i igue 3 a e sequece aways sas om 5 e o 3 e oy(Α sies i e oyA_

aaase ae ee ieiie y aigig e uic comemeay A (cA a eesse sequece ags (ESs wi geome sequeces usig a meo

escie i (ia e a 5

igue 3 Oe eame sequece om oyA_ e oy(Α sie is ocae i e mie o e sequece e gee i is iicae ae "" a e oy(Α sie is iicae y e ume ae e as eio

Sice e geomic sequeces ca o e iec iu o e cassiicaio agoim osiio-seciic scoig maices (SSM o 15 cis-eemes (u e a

5 wee use o seac sequeces a coe e sequeces o umeica aues

e SSM o AUE1 cis-eeme is sow i igue 33 Eac scoe i e mai

eecs ow equey e uceoie occus a e gie ocaio o e cis-eeme 15

I meas e uceoie cou o e a a ocaio ase o e osee sequeces

Figure 3.3 e SSM o AUEI e ume a e egiig o e ow is e ocaio o e uceoie i e cis-eeme e scoes eec e eaie equecies o eac uceoie occuig a e gie ocaio I meas e uceoie cou o e osee a is ocaio ase o e sequece aa

Sice e 15 cis-eemes sa o a sequece wi eg e scoe o eey cis-eeme was cacuae y e oowig omua

S= ma Σ =1 wee m; is e scoe o uceoie i a osiio j i e SSM is e eg o e cis-eeme a e ma ucio is o e sequeces wi eg i e cis-eeme region. o eame o AUE e scoe is cacuae i -1 o -1 egio I

ee eiss a I aue S is se o e -15 as i is e owes aue osee i e

aiig aa wic wi e iscusse i a ae secio a is o a sequece wi

eg equa o i is coee o a eco wi 15 aues coesoig o 15 cis-eemes

3.3 Correlation Between Fifteen Cis-elements

o uesa e eaiosi amog cis-eemes we aie a ieacica cuseig meo o gou e 15 cis-eemes ase o ei occuece o a e sequeces i e oyA_ scoes o e 15 cis-eemes wee cacuae ase o 6

e mie o e sequeces a ey ome a mai wi imesios

9315 Usig easo coeaio as e meic a aeage ikage o ee

uiig (Eise e a 199 e 15 cis-eemes ca e agey iie io wo gous (see igue 3 Oe gou cosise o CUE1 CUE CΕ ΑUE ΑUE3 a ΑUE wic ae a useam eemes ece CΕ a e oe cosise o

owseam eemes ece AUE 1

igue .4 Cuseig o 15 cis-eemes usig oy(Α sies om oyA_ SSMs o 15 cis-eemes wee use o seac uma oy(Α sies om oyA_ i ei esecie egios Eac ow is a oy(Α sie (93 i oa a eac coum is a cis-eeme ieacica cuseig usig easo coeaio was emoye o cuse cis-eemes a e esuig ee is sow o o o e eama e gay scae is iicae a e oom o e o

is gouig is ous as cuseig usig oe aamees suc as

Keas au coeaio a comee ikage aso esue i simia gouigs 17

us useam eemes a owseam eemes i geea ae iee oies iicaig a ey may comesae o eac oe uig mA oyaeyaio I

iocemica ems is esu suggess a weak useam sigas ca e "ee" y sog owseam sigas a ice esa owee ue eeimes ae eee

o coim is moe

3.4 Training, Testing, and Prediction

o cassiicaio ee ae wo ses is ai e kow aases a geeae e

aie moe Seco use e aie moe o eic e esig aase a aaye

e accuacy e aiig aase use i is suy cosise o sequeces wi 1 osiie sequeces aomy seece om oyA_ a 1 egaie sequece geeae y is-oe Mako cai moe (MCM (see Aei A1

om e osiie sequeces e esig se is geeae e same way wi e

aiig se as e ume o sequeces ayig o iee uoses A e scoes o aiig aa a esig aa wee scae y (S-s/σS wee s is e mea aue o

S o a osiie a egaie sequeces i e aiig aa a σS is e saa

eiaio o S o a osiie a egaie sequeces i e aiig aa

o aiig oy e mie wee use ecause we aeay kow a

e oy(Α sie is ocae a e mie o e osiie sequece e ue egaie sequece is a o ge a i is assume a aom sequeces coai ey ew

oy(A sies eeoe e MCM was use o geeae e egaie sequeces o

i eg y usig SSMs eey sequece o geeaes a eco wi

15 aues eeoe wo gous o aa wi 15 imesios wee geeae e aa

om osiie sequece ae aee wi 1 a e aa om egaie sequece ae 8

aee wi -1 ese wee e iu o e cassiicaio agoims Ae aiig

e moe was geeae a eay o acce esig aa o eicio

o esig a sequece om e oyA_ aaase we eee o o kow e ocaio o e oy(Α sie a use e aie moe o eic i e sequece coais a oy(Α sie o o e sequece eg is a eey

susequece was use o geeae e aues Muie-i issue aises a is

iscusse i e ae secio

.4. Comaiso o iee Meos

ee ae may cassiicaio meos o aess wic oe wou e use o ou

eicio a oe-i moe was geeae o es o simiciy a is o esig we oy use e mie o geeae e eco scoes So o eac sequece we oy ee o eic oce a e esu is oy o eic i e mie sie o e sequece is a oy(Α sie o o

I ig o is we ese ee iscimia aaysis meos amey iea

iscimia aaysis (A quaaic iscimia aaysis (QA (see Aei

Α a suo eco macie (SM (see Aei Α3 A is a

ye-ae o seaae wo o moe casses wi iea comiaio o aiaes

(ag I assumes a e aa isiuio o eac cass is oma a a a casses ae e same coaiace QA uses a quaaic suace o seaae casses

(ag a aso makes e assumio o a oma isiuio u eaes e

equieme o coaiace SM emoys kee ucios o seaae aa y a

ye-ae a is suoe y ecos yig a e ouaies o casses (oes a

aik 1995 A ese meos ae ee use o ioogica sequeces o ieiicaio o sigas suc as sice sies (ag ; ag e a 3; Yeo e a

ο comae ese meos we aomy seece oy(A sies om e

oyA_ aaase eiee e —1/+1 geomic egio suouig eac

oy(A sie a use em as a osiie aase We e aomie e osiie

aase y a is-oe MCM o oai egaie sequeces eac wi i

eg Usig A QA a SM ucios i ogam (see Aei we comae ei eomace o eicio o oy(Α sies wi esec o sesiiiy

(S seciiciy (S a coeaio coeicie (CC (see Aei AA As summaie i ae 31 o ese ee meos QA aciee e es sesiiiy a SM aciee e es seciiciy ase o 1 aom ess -aues ae cacuae ase o wo-aie -ess comaig 1 aues om A o QA wi

ose om SM Oea SM as e es eomace uge y CC us we

ae seece SM as e eicio meo o ue suies

ae . Comaiso o A QA a SM

S S CC Meo Mea -aue Mea -aue Mea -aue A 3 3E-1 75 1Ε-99 3 1Ε-7 QA 3 13Ε- 7 11E-1 3Ε- SM 3 - - 93 - LDA, linear discriminant analysis; QDA, quadratic discriminant analysis; SVM, support vector machine; SN, sensitivity; SP, specificity; CC, accuracy. The p-value is calculated based on t-test comparing 100 values from LDA/QDA with those from SVM.

.4.2 eaeOeOu Meo

o eamie ow eac eeme coiues o e eicio a wee o o we ca

euce e ume o aiaes we couce a eae-oe-ou eeime wee we

e ou oe eeme a a ime a cacuae is eec o S S a CC i oy(A sie eicio We easoe a omissio o imoa eemes wou sigiicay 20

owe e eomace o eicio weeas omissio o oesseia oes wou

o make muc ieece

As sow i ae 3 we ou a CUΕ is e mos imoa oe as omissio o CUΕ e o susaia o o o sesiiiy a seciiciy wic is cosise wi e oio a e AAUAAA eeme is ciicay imoa o mA oyaeyaio

ae .2 Eecs o eaig Ou Some Cis-eemes

SN S CC Element(s) Mea Μ α left out P-value -aue Mea -aue ( ) ( ) oe 3 - - 93 - AUEl * 33 1 91 ΑUΕ 35 7 0.11 9 9 ΑUΕ3 3 7 1 9 AUE4* 7 9 9 93 CUE1 39 1 3 7 5Ε-3 CUE2 53 99Ε-11 715 7Ε-153 395 Ε-175 CDEI * 84.1 0.14 5 91 0.16 CDE2 91Ε-17 3 3Ε-1 1Ε-19 CDE3* 1 5 7 93 CDE4* 31 9 35 ADE 1 * 7 0.18 9 ADE2 * 7 9 0.10 ADE3 37 5Ε- ADE4* 3 5 7 17 9 ADE5 83 9 Ε-3 7 1 9 elements 82.1 11 Ε- 84.5 0.671 3 Ε-17 Element(s) were left out at both the training and testing stages. None, no elements deleted; 9 elements, leaving out 9 elements marked with asterisk in the table; AUE, auxiliary upstream element; CUE, core upstream element; CDE, core downstream element; ADE, auxiliary downstream element. P-values are calculated based on 30 tests comparing with none. Elements marked with an asterisk indicate that their individual omission would not lead to significant change (p-value > 0.05) of SN, SP, or CC.

Omissios o CUE 1 o CΕ a simia eecs aei o a esse ee iicaig a U-ic eemes suouig e oy(Α sies ae imoa

eemias I aiio omissios o AUE3 ΑΕ3 o AE5 cause os i 21 sesiiiy iicaig ei imoa oes i oy(Α sie seecio o e es o e

9 eemes (iicae y a aseisk i ae 3 eaig ou ay sige eeme cause some ecease i eicio eomace wie oe o em aeae o e sigiica ase o a -es (-aue 5 ae 3 owee eaig ou a 9 eemes mae o sesiiiy a seciiciy o sigiicay us ese 9 eemes may coiue o oy(Α sie ecogiio cooiaey a some eemes may e imoa o oy a sma suse o oy(Α sies

ake ogee ese aa iicae a e 15 cis-eemes ae ecessay o

oy(Α sie eicio aiaig e ucioa imoace o ese eemes o

oyaeyaio I aiio e ac a muie aiaes ae equie o oy(Α sie eicio suggess a comiaoia mecaism o oy(Α sie ecogiio i

uma ces

3.4.3 Polya_svm

o sequece wi eg geae a e oe-i moe is o eoug sice we o o imow wee e oy(Α sie is So e eicio as o e caie ou a eac ossie osiio o a gie sequece a a muie esig issue aises we

eicig e ikeioo o a sequece coaiig a oy(Α sie a is o eac

ocaio we cou eic i a ocaio is a oy(Α sie o o owee e

oem aises a ee i e eicio is goo o a og egaie sequece we cou eic may oy(Α sies isie e sequece o eame i e ue

egaie ae is 95% o a 3 og sequece wi 1 sies i is ossie o ae 5

ocaios o e eice as oy(Α sies Some oe meos ee o e iouce

o eimiae is ki o oise I as ee ou a oyaeyaio ceaage is usuay eeogeeous a occus i a wiow ae a a a eme osiio (ia e a 5 Α sime es was oe ase o 1 sequeces 5 osiie sequeces 22 randomly selected from polyA_DB and 50 negative sequences generated from the first-order MCM. The result is shown in the Figure 3.5.

Figure 3.5 Heatmap plot of predicted probability results of 100 test sequences. The poly(A) site is set to be 0, and the downstream direction is referred to as the positive iecio e osiie sequeces coaiig oy(Α sies ae aomy cose om oyA a egaie sequeces o coaiig oy(A sies ae geeae y a is oe MCM wi eg e scae is iicae a e oom o e o

Based on this, we designed a window-based scoring scheme to address the multiple-hit issue: we required a region M of m nt to have probability greater than 0.5 at each position in the region and another region N of n nt within M to have high probability values. The region M is called a positive region, and the region N is called a high probability region (HPR) (see Figure 3.6). Using different combinations of m and n, we found that m=30 and n=10 achieved the best performance when the product of the 10 probabilities for HPR was set to be greater than 0.5. Thus, on average, the probability of being positive for each position is greater than 0.933 in HPR. 3

igue 3 Scemaic o 15 cis-eemes i e oy(Α egio a e seac agoim o oyasm A oy(Α sie is iicae y a eica aow e eice oaiiy i e osiie egio is geae a 5 a e ouc o e eice oaiiies i e ig oaiiy egio is geae a 5

Moiae y ou iiia esus we eeoe a sa-aoe ogam cae

oya_sm usig e E aguage (see Aei 1 is ogam uses e 15 cis-eemes a SM o eic e oy(A sies o SM eicios e SM

iay ISM (see Aei Α3 was use wi C-suo eco cassiicaio

(C-SC meo a e aia asis kee ucio ( e eau seigs ie

=1 a gamma= / 15 wee aie I ac we ie iee aamee seigs a we ou ee was o ig ieece o cacuaig oaiiies we aie a wiow-ase ausme meo usig oaiiy aues geeae y ISM

e oaiiy o aig a oy(Α sie a osiio i is eemie y is E-aue cacuae y E. =1— ΠΡ (j) wee j is a osiio eaie o i, ( is e I

oaiiy o osiio j, a j e {- —3 — —113 5} us e E-aue is

ase o a 1 ceee a i a e ige e oaiiy e owe e

E-aue I aiio we aso equie a e egio (-15/+15 aace o a osiie sie o ae (i 5 a eey osiio o is sequece

E-aues o 1 osiie a 1 egaie sequeces ae sow i a

eama (see igue 37 accoig o e gay scae sow a e oom Eac

sequece is i eg A osiie sequeces ae a oy(Α sie i e mie

a some may ae muie oy(Α sies oy(Α sie eicio was caie ou i

e 11 o 5 egio us eac ow coais E-aues e -ais is e

osiio i e sequece As sow i igue 37 usig is meo oya_sm ca

eeciey eimiae ase osiie sies a accuaey ocae ea oy(Α sies

igue 37 eicio o osiie a egaie sequeces E-aues o 1 osiie a 1 egaie sequeces ae sow eseciey e gay scae is sow a e oom o e igue Eac sequece is i eg a e oy(Α sie is se a ocaio 3 oy(Α sie eicio was caie ou o e 11 o 5 egio 25

3.5 Prediction of Poly(A) Sites in the Human Genome

We ese oya sm usig a uma oy(Α egios i e oyA_ (93 i

oa a comae is eomace wi oyaq a commoy use oo o oy(A sie eicio (aaska a ag 1999 o oya sm i e eice ocaio

(mie o is wii om a ea oy(Α sie e eicio was cosiee as ue osiie (ΤΡ a oewise as ase egaie ( o oyaq sice i uses e AS ocaio o oy(Α sie eicio we cosiee a eicio

o e ΤΡ i a AS is wii useam o a ea oy(Α sie As sow i ae

33 oya_sm is 33% moe sesiie a oyaq (5% S s 395% S

We e iie oy(Α sies io iee gous ase o wo cieia ei usage a ocaio o oy(Α sie usage we use e ume o suoig cA/ESs o a oy(Α sie o eemie is equecy o usage oy(Α sies i gees wi oy oe oy(Α sie ae cae cosiuie sies oy(A sies i gees wi muie sies wee goue io sog weak a meium A sog sie is use moe a 75% o e ime ase o suoig cA/ESs I a gee as a sog sie oe sies ae cae weak sies I a gee oes o ae a sog sie a sies ae cae meium sies o oy(Α sie ocaio we is seaae oy(Α sies ocae i ios a iea eos (cae useam sies om ose i 3-mοs eos a

e iie oy(Α sies i e 3-mos eos io ee gous eeig uo

ei ocaio (see ae 33 e 5-mos sie is cae e is sie e 3-mοs sie is cae e as sie a sies i ewee e 5-mos a e 3-mos sies ae cae mie sies I aiio i a 3-mos eo coais oy oe oy(Α sie e sie is cae a sige sie oya_sm was oe 5°/o moe sesiie a oyaq o

eecig meium a weak oy(Α sies a aou 195% a 7% moe sesiie

a oyaq o sog a cosiuie oy(Α sies eseciey (see ae 33 26

ae . Comparison of Polya_svm with Polyadq for Different Types of Positive oy(Α Sies

polya_svm polyadq SN Poly(A) site SN SN Di type ΤΡΡ FN ΤΡ FN (ο (%) o Total 15,469 13,814 52.8 11,563 17,720 39.5 +33.8 Strong 1,602 655 71.0 1,341 916 59.4 +19.5 Medium v 8,301 8,221 50.2 5,488 11,034 33.2 +51.3 ζ Weak 151 2,565 37.2 961 3,125 23.5 +58.3 Q. Constitu tive 4,045 2,373 63.0 3,773 2,645 58.8 +7.2 Single in 3'-most 4,941 2,853 63.4 4,516 3,278 57.9 +9.4 exons First in 3'-most 2,469 3,583 40.8 1,522 4,530 25.2 +62.2 ο exons 2 Middle o in 3-mοs 2,256 2,479 46.7 1,212 3,523 25.6 +86.1 exons Q ,' Last in 3'-most 3,763 2,289 62.2 2,839 3,213 46.9 +32.6 exons Intron and internal 2,040 2,610 43.9 1,474 3,176 31.7 +38.4 exons o oya-sm a sequece is eice o e ue osiie (ΤΡ i a eice oy(Α sie (e mie osiio o is wii om a ea oy(Α sie o oyaq a sequece is cosiee ΤΡ i e sequece is eice o e osiie a e ea oy(Α sie is wii owseam o a oy(Α siga AAUAAA o AUUAAA See e o esciio o iee yes o oy(A sies

As o oy(Α sies ocae i iee egios o a gee oya_sm aso mae moe sesiie eicios a oyaq i a caegoies aicuay o oy(A

sies ocae i e useam o e 3-mos oy(Α sies (% a 1% moe sesiie o e is a mie oy(Α sies i 3-mos eos A moe eaie aaysis eeae a e ig sesiiiy o oya_sm is maiy ascie o is 2 caaiiy o eic oy(Α sies wiou AAUAAA o AUUAAA a sies wi weak owseam sigas ake ogee ese aa emosae a oya_sm is

igy sesiie o oy(Α sie eicio

oya_sms eomace o sequeces wiou oy(Α sies o egaie sequeces was aso eamie A ue egaie sequece is iicu o oai as ee is o eesie eeimea eiece o egaie sequeces us we ese seea

yes o sequeces a wee esume o ae ey ew oy(Α sies icuig

aomie oy(Α egios (-3/+3 aomie geome sequeces mA coig sequeces (CS a 5 uasae egios (Us

As sow i ae 3 comaae ase osiies ( wee eice y

oya_sm a oyaq o aomie sequeces owee oya_sm eics sigiicay moe sies a oyaq i CS a 5U sequeces (moe a

o Ieesigy e ieece was o sigiica we aomie CS a

5U sequeces wee use suggesig a some o e ase osiies i CS a

5Us eice y oya_sm may acuay e ue osiies Accoigy i is

emig o secuae a ee may i ac eis a age ume o oy(Α sies i

CS a 5Us is wou e cosise wi eious iigs o oy(Α sies i iea eos (ia e a 5; a a Ma 5 owee is yoesis as ye o e ese y we a eeimes O e oe a a igy sesiie meo wou ai i e ieiicaio o ese sies wic ae iicu o eec y cA/ES-ase aoaces as may o ese oy(Α sies wou esu i aea

ascis a may e aiy egae y ceua sueiace mecaisms suc as

ose wiou a i-ame so coo (iscmeye e a 28

ae .4 Comaiso o oya_sm wi oyaq o iee yes o egaie oy(A Sies

oya_sm oyaq S CC egaie S Ρ i Diff Se ΤΝ CC CC (% (% (% oy(Α egio 77 37 7 7 79 +51 +79 is-oe MC Geome is-oe 5 3 51 7 53 79 33 +9 +59 MC CS 1 9 7 3 5 5 35 -1 -3 CS is-oe 7 13 953 51 5 15 93 7 + +3 MC 5 U 5 55 1 917 37 -17 +5 5 U is-oe 91 9 97 57 9 91 7 + +17 MC MC Mako cai oy(A egio is-oe MC aomie -3 o +3 sequeces suouig oy(A sies; geome is-oe MC aomie uma comosome 1 sequece; CS coig egio sequeces o uma eSeq sequeces; CS is-oe MC aomie CS; 5 U 5 uasae egio o uma eSeq sequeces; 5 U is-oe MC aomie 5 Us o eac egaie se 5 sequeces wee geeae a eice y oya_sm a oyaq e ocess was eeae 1 imes a e mea aues ae esee i e ae Ρ a o ae 33 o a oy(A sies i e oyA- aaase wee scae a use o cacuae CC

.6 ue Imoemes o oya_sm

We ae mae seea cages o e oya_sm ogam a imoe e accuacy o eicio y 1% oe is is eease oya_sm 1 (Ceg e a Iiie

aues ae geeae we a cis-eeme oes o eis i a gie sequece owee

o e acica use o SM eac aiae equies a umeica aue I oya_sm

1 iiie aues wee eace y e owes scoe o a cis-eemes i e

aiig aa ie -15 o eamie i is asec ca e imoe we ae ie ee oe oios (see ae 35 Meo 1 is e oigia meo wee a iiie

aues wee se o -15 I Meo e iiie aues wee se o eo I Meo 3 29

e iiie aues wee se o e e owes aue a a gie cis-eeme ca aciee

I Meo e iiie aues wee se o e a aue o - wic is e owes

aue o a cis-eemes eicio was eome 3 imes o eac o e ou

iee seigs a e eaie CC esus ae sow i e ae 35 A aues wee omaie o Meo 1 e oigia meo As sow i ae 35 oya_sm

a e es eomace we e Meo was use wic is % gai i accuacy oe e oigia meo is imoeme iicaes a e ye-ae i cassiicaio is o eie we y e suo ecos i e aeas wi ow egaie

aues

Table 3.5 Comaiso o iee Meos o eacig Iiie aues

eaie eaie eaie ye S S CC Meo 1 1 1 1 Meo 1 9 95 Meo 3 1 1 13 Meo 1 11 1 Method 1 is the original method, where all infinite values were set to —15. Method 2 sets infinite values to 0. Method 3 sets infinite values to the lowest value a given cis- element can achieve. Method 4 sets infinite values to an arbitrary value of -6.28, which is the lowest value for all cis-elements. Prediction was performed 30 times for each of the four different settings and the relative CC results are shown in the table. All values are normalized to Method 1, the original method.

We ae use wo egios o oy(Α sie eicio o esue ig seciiciy ie a 3 osiie egio a a 1 (Ceg e a o eamie wee is se ca e simiie wiou oosig eomace we ae

ese meos oy usig wi iee sies (1 o 99 I aiio we use

e omua s = —og (Π p o eie a scoe o a wee p, is e oaiiy

aue o osiio i wic is oie y ISM a n is e sie o e We aso seece e oima oya_sm cuo scoe ase o e iges CC aue o eac seig As sow i igue 3 we ou a o 3 a cuo 0 scoe o aea o e e oima o acieig iges CC aue us we use

o 3 a cuo scoe o o eemie oy(Α sies i e ew ogam

(esio I aiio we seea egios oeae we seece e wi e es scoe o eese ei comie egio

igue .8 s eaie maimum accuacy e aies om 1 o 99 a e cuo aues ae [1 3 5 1 15 5 9 95 9 97 9 99 991 99 999] o ay gie aue e es CC ca e cacuae o a gie cuo aue e -ais is a e y-ais is e omaie CC o iee usig e oimie sie (3

o eamie i ee is coeaio ewee oya_sm eicio scoe a e seg o a oy(Α sie we iie 93 oy(Α sies oaie om oyA_

aaase io seea gous ase o e usage o a oy(Α sie sog sie (S cosiuie sie (0, meium sie (M a weak sie (W as escie i secio 35

e usage is ase o e ume o suoig SSs om o-omaie cA

iaies as escie i (u e a 5 As sow i igue 39 a coeaio is

isceae o e oya_sm eicio scoe a oy(A sie usage iicaig a

e scoe ca e use as a guie o eemie e seg o a oy(Α sie We aie e -es o S s M a W a e -aues ae ess a 1e-1 a is e soge e oy(A sie e smae e oya_sm scoe

igue . oo o oya_sm scoes o iee yes o oy(A sies S e sog oy(A sie; e cosiuie oy(A sie; M e meia oy(A sie; W e weak oy(A sie e y-ais is e oge aue o e eice oya_sm scoes CHAPTER

SAGE DATA ANALYSIS

4.1 Introduction to SAGE

Seia Aaysis o Gee Eessio (SAGE (ecuescu e a 1995 is a

ig-ougu a ig-eicie meo o simuaeousy eec a measue e eessio ees o gees eesse i a ce a a gie ime Comae wi micoaay eeime e SAGE is eesie o eom a i equies ige

eciques e ie ses o SAGE ae sow is igue 1 a e eaie

ooco ca e ou om e wesie http://www.sagenet.org/protocol/index.htm

(isie o 7//

Figure 4.1 A ouie o SAGE SAGE sequeces a cous a so ag wi 1-17 ocae a e 3 mos o e mA ae e ag CAG usig wo esicio eymes y comaig e ume o e ags SAGE ca esimae e eaie eessio ees o iee ascis i e ce igue ake om oweoi eseaio y Kei Coomes Secio o ioiomaics eame o iosaisics M Aeso Cace Cee Uiesiy o eas

3

e oigia SAGE aoac (ecuescu e a 1995 geeae a 1 ase

ai ( ag (eee o as a so SAGE ag eie om e 3-e o e

asci y usig e aggig eyme 1aIII ae o usig a ew aggig eyme

Mme (Saa e a a 17 y ag (eee o as a og SAGE ag was geeae a eace e seciiciy o e maig om e SAGE ag o e asciome

e coecio o a A ascis ue o e eaie ioucio o so SAGE

ag e uic SAGE aaase as moe so SAGE iaies a og SAGE

iaies o e uma aase Mos o e ags ae uiquey-mae o some gee

u ee si eiss a sma oio o ags aig moe a oe gee eig mae

o em o oe ag i is ecessay o i e es gee mae o is ag ase o

e eiece o ES/cA ouaey is ki o wok as ee oe y SAGE

Geie (oo e a a e aaase is uae eguay oeeess eos occu i e eeimea aa ue o a o o acos suc as sequecig a C

osiie a egaie eguaio o iee gees a oeis ae ey commo eomea i ioogica sysems suc as aciaos o iiios a ae

ee suie wiey (Ma e a 199; Koeky a Myug 1; Ku e a 5

owee e eguaio o aeaie ascis om e same gee as seom

ee iesigae

Wi e ogess i geome aoaio a emegece o age amou o iomaio wi ega o cA/ESs suyig e eguaio o iee

ascis om e same gee ecomes ossie a may ieiy some ukow mecaisms o gee eessio iicuies si ie i e accuae measueme o eessio ees o e ascis age-scae eessio oos suc as micoaay

ae ee use o aaye e co-eessio ees o kow comemeay

ascis (ikaio e a 3 owee i is o eecae o ose mA 3

ascis a ae o imie i e cis Seia Aaysis o Gee Eessio

(ecuescu e a 1995 is a aeaie ecoogy o sysemaicay eec e goa gee eessio ees ee ae a o o aaages i usig SAGE (uea a uea a mos imoay SAGE ca eec e eessio ees o

iee ascis wiou io kowege o e gees wie micoaay ca oy

eec e imie asci eessio ees

Aaysis ieie

Moiae y e oseaio om e SAGE aa a gee cstf-77 as wo

ascis wi oosie eessio ees (a e a we esige a ieie o sysemaicay iscoe e gees wi simia eessio aes e aa ow o suc iscoeies ca e eesee y igue a ey ae iscusse i eai i

e oowig secios

4.2. eiae iaies Seecio

e SAGE aa wee owoae om e SAGE Geie (oo e a see (see Aei C1 ue o e oise a eeimea aiaio o iee

iaies o a e iaies cou e use i is suy Eac iay coais e

oowig iomaio e sequece ags e equecy o eac ag e oa ume o ags e ume o uique ags e issue ye o e iay ec A eame o

e SAGE aa is sow i ae 1 35

Figure 4.2 e ieie o SAGE aa aaysis e oise eiss i e iaies a e maig om ags o e uigees y ieig e ueiae iaies a ags e sigiica gees ca e seece om e osee coeaio ewee wo ags om e same gee e sigiicace is ase o e aomie coeaio ewee ay wo ags o ecessay om e same gee

Table 4.1 SAGE aa Eame

The third column indicates the number of tags from the first column found in the libraries indicated in the second column. For any given library, the number of unique tags for this library is the number of rows corresponding to this library and the total number of tags is the summation of the third column corresponding to this library.

A og o so iay is cosiee o e eiae i e oowig wo coiios ae saisie

• e oa ume o ags is geae a 5 o wic aou 5% o e ags ae missig (ecque e a 3

• e aio (oge o e oa ume o ags iie y oge o e ume o uique ags is ocae i e iea (mea- s mea+ s wic coesos o a 955% coiece iea wee e mea is e aeage aio o a og SAGE iaies o a so SAGE iaies a s is e saa eiaio o e aio o a og SAGE iaies o a so SAGE iaies

We ca see a ese wo coiios ae easoae om e o o og a so iaies (see igue 3 I igue 3 e eica ie eese e cuo ie o is coiio a e ee skew ies om o o oom eeses (mea+s

ie e mea aue ie a (mea- x s ie e soe o e ies is e mea

aue o e aios e seco coiio was mae ase o e assumio a e aeage eessio ee o e woe iay sou o e oo iee ewee e

iaies

e oa aase coais 9 uma so SAGE ag iaies a 59 og

SAGE ag iaies Ae ayig ese wo coiios o e og a so iaies

eseciey 1 so iaies a 9 og iaies wee seece e seece

iaies ae ocae i e egio ecose y e eica cuo ie (mea-s

ie a (mea+s ie (see igue 3 e og iaies sou e moe

eiae a e so oes a i was coime y e eaio (9/59 19

4.2.2 Is SAGE aa issue Seciic

o iesigae i e SAGE aa ae some ias o iee issues we oe e

aio (Ιοg o e oa ume o ags iie y og o e ume o uique ags

esus e issue oe om eiae so a og SAGE iaies o uma aase

om e igue e aeage gee eessio ee o sem ces is ige a

o oe issues i e igue wie e gee eessio ee o e ceeeum is

eaiey owe e ioogica eaaio is ucea o us 37

igue 3 Coeaio ewee e ume o uique ags a e oa ume o ags e aa ae om uma SAGE iaies e eica ies eese e is coiio a e ee skew ies coeso o e seco coiio ey ae use o e seecio o e eiae iaies

igue e gee eessio ees o iee issues (a eiae og ag SAGE iaies o umas ( eiae so ag SAGE iaies o umas Eac coum eeses oe issue Eac a o e o eeses e ume o iaies o a issue a e igue o e oom is e oo eecig e aeage eessio ee o a issue 8

4.2. e eiae ags

Sice e maig om a SAGE ag o a uigee a asci cuse is o uique we ee o geeae e eiae ags o eac uigee Oe uigee may coai moe

a oe SAGE ag ue o iee ascis eogig o e same uigee Usig

e aaase ies s_oges_gee maig e og ag o e es uigee

s_soes_gee maig e so ag o e es uigee s_ogequecies icuig e og ag equecy aa a s_soequecies icueig e so

ag equecy aa we geeae e eiae ags ase o e oowig wo coiios

• I a so ag was mae o a uigee y ie s_soes_gee a og ag wose is 1 is e same as e so ag was aso mae o e same uigee y ie s_oges_gee e e og ag was cosiee o e a caiae o e eiae ag o is uigee I sou e oe a e eiae ag is a og ag

• o a gie ag some iaies may o coai e equecy aa o is ag e ag is cosiee o e a eiae ag i is aio (e ume o iaies coaiig e ag iie y e ume o oa iaies is geae a 5% (aiay eceage ausae o moe sige coiios is ca eimiae a o o ags oy eesse i a ew iaies

o eame GCGAGAAAAAACA was mae o gee cstf-77 y ie

s_oges_gee a eesse i iaies om s_ogequecies;

GCGAGA was aso mae o gee cstf-77 y ie s_soes_gee a eesse i 1 iaies om s_soequecies GCGAGAAAAAACA saisies e aoe wo coiios so i is cosiee o e oe eiae ag o gee cstf-77. ee ae a oa o 93 uigees eesse i ese eiae iaies Ae ayig e is coiio ee ae 1755 uigees e uemoe ayig e seco coiio ee emai 95 uigees

4.2.4 Sigiica Uigees

Oce we geeae e eiae ags e eiae og SAGE iaies a so SAGE

iaies ca e mege sice e so ag equecy aa ca e eae as e coesoig eiae og ag equecy aa A e oowig cacuaios ae

ase o e 5 (1 so + 9 og eiae iaies mege iaies o ieiy

e egaiey eguae ags we ee a eas wo eiae ags o eac uigee

Amog e 95 uigees a ae a eas oe eiae ag oy 1957 uigees ae moe a oe eiae ag o eac uigee e easo coeaio o e ag e miio (M aa (see Aei C ewee ay wo ags was cacuae a is i e uigee as m eiae ags we ee o cacuae m(m-1/ coeaios ewee

ese m ags o eac coeaio we cacuae e -aue ase o e es o ieeece (see Aei C I e eessio ees o e wo ags ae oosie

o eac oe e e -aue sou e egaie I we ay a saa -es o

ese egaie -aues usig a -aue = 1 e 1 uigees ae seece o coai a eas wo ags a ae egaiey eguae

o ge e sigiica egaie eguaio we ee o kow e -aue

isiuio o aom ags o oai is we aomy seece wo ags om a

eiae ags 1 imes Eac ime we sue e aa 1 imes ase o a

ise-Sue (see Aei C a cacuae e easo coeaios a e

-aues We ou a e aom -aues ae ieay coeae wi e egees o eeom (see igue 5 wic ae e eesse iay umes sae y a

eas oe o e wo ags Siy-oe sigiica uigees wee seece i we se e

-aue cuo as 1 e -aue is cacuae ase o e aom isiuio o seciie egee o eeom o eame i e -aue is o o some ag ai wi

egee o eeom q om e aom simuaio we ae e aom isiuio 40

o egee o eeom q e -aue o o is cacuae y e ume o aom

aa ess a o iie y e oa ume o aom aa e eaie iomaio

o ese 1 uigees is sow i Aei C3

igue 4. -aue isiuio o aom ags Eac oi eeses oe ag ai e -ais is e ume o iaies i wic a eas oe ag is eesse e y-ais is e -aue o e ag ai cacuae om e easo coeaio

We icke u ie ieesig a we-kow uigees (see ae

yig o uesa e mecaism o e egaie eguaio Mos o em ae e

AA iig oeis oug geeaig e aeaie ascis ey may

eguae e iig aciiy a ce ucio o see is ceay we geeae e

asci sucues coesoig o ese uigees usig ioe (Saic e a

e ascis wee aige o e geome a e eiae SAGE ags wee

ocae Oe eame o gee gρ3 is sow i igue e os o e oe eig uigees ae sow i Aei C 1

ae ie Sigiica Uigees wi egaiey eguae ascis

>Hs.334885 GTPBP3 GTP binding protein 3 (mitochondrial) FLJ10330 PRP38 pre-mRNA processing factor 38 >Hs.342307 (yeast) domain containing B HNRPD Heterogeneous nuclear ribonucleoprotein D >Hs.480073 (AU-rich element RNA binding protein 1, 37kDa) HNRPD Heterogeneous nuclear ribonucleoprotein D >Hs.480073 (AU-rich element RNA binding protein 1, 37kDa) ΑΤΡΑ Aase Ca++ asoig caiac musce >Hs.506759 sow wic >Hs.516539 ΡΑ3 eeogeeous ucea iouceooei Α3 >Hs.517262 SON SON DNA binding protein >Hs.529798 ΒΤ3 asic asciio aco 3 >Hs.531106 ΒΜ5 A iig moi oei 5 >Hs.546361 AΡC1 Aase Ca++ asoig ye C meme 1 A unigene is the transcript cluster. First column is the unigene I and the second is the description of the unigene, including gene name and gene function. The information of the tags and correlations is given in Table C.1 of Appendix C.

igue e asci sucue o Uigee s335 e is ie eeses e comosome iomaio o e gee a e oowig ies eese e asci sucue geeae om is gee e SAGE ag is iicae y e u-iage a e accessio ume o e asci is iicae ue e is eo o e asci

3 esus a Cocusios

Amog e 1957 uigees a ae moe a oe eiae SAGE ag 5 uigees

ae ags a ae ocae i iee sas o e comosome Amog ese 5 uigees 3 uigees (iicae i o yeace i Aei C3 ae ou i e 1 42 sigiica uigees Mos o e ascis o iee sas ae co-eessio

o egaie eguaio Simia esus aso wee ou i e ieaue (Quee e a

a is e egaie eguaio is o maiy ue o e oieaio o e

ascis u isea i mus e eguae y some oe ukow mecaisms

om e aa o maig e SAGE ags o e oy(A sies (iae aa

om i ias a amog e 1 sigiica uigees 15 uigees wee ou

o coai iee oy(Α sies uigees wee ou o use e same oy(Α sies a e oy(Α sie iomaio o e emaiig uigees is ukow ue o some easos suc as e iee uigee esio ewee SAGE Geie a ou

aaase owee i suggese a e aeaie oyaeyaio may e oe

aco a geeae e oosie eessio ae o iee ascis

o ou kowege is is e is sysemaic suy o e egaie eguaio

ewee wo iee ascis om e same gee a is may oie some isig io aeaie oyaeyaio ee we ae maiy ocuse o e uma

aa e mouse SAGE aa ae aso aaiae a ae age sie a e uma

SAGE aase We ca ay e same meos o e mouse SAGE aa a i e sigiica gees uemoe y a omoog aaysis ewee uma a mouse gees egaie eguaio o some gees may e ey sigiica CHAPTER 5

AEAIE OYΆEYAIO

5.1 Introduction to Alternative Polyadenylation

Α age ume o uma gees ae ee ou o coai moe a oe oy(A sie eaig o muie ascis ee oug some o em may o ae oei

oucs I was eoe ecey a aou 5% o uma gees a 3% o mouse gees ae muie oy(A sies (ia e a 5

Gees wee cassiie io ee gous accoig o e ocaios o ei

oy(Α sies (ia e a 5 (see igue 51 A ye I gee coais oy oe

oy(A sie; a ye II gee as muie oy(Α sies a ocae ae e so coo i e 3-mos eo; a ye III gee as oy(Α sies ocae i egios useam o

e 3-mos eo I is ossie a muie oy(A sies ae ocae i e 3 Us o e as eo o ye III gee us aeaie oyaeyaio o a ye II gee ca ea o aiae 3 Us wi e same oei oucs Sice 3 Us coai

A cis-eemes wic ae imoa o mA saiiy aeaie

oyaeyaio o ye II mig ay a ciica oe i gee eguaio Aeaie

oyaeyaio o ye III gees eas o aious oei oucs

o eac gee ye a ue cassiicaio was mae ase o e sie

ocaios (ia e a 5 1 S eeses a ye I gee oy(A sie; eeses

e 5-mοs oy(A sie i a ye II gee; eeses e 3-mοs oy(A sie i a

ye II gee; M eeses a mie oy(Α sie ewee a i a ye II gee; 3U eeses e oy(Α sie ocae useam o e 3-mos eo i a ye

III gee; 3S eeses a sige sie i e 3-mοs eo o a ye III gee; 3 3M 3

43 44

simia o Μ eese oy(A sies i a ye III gee o coeiece 3

eeses 3 S/3/3Μ/3

igue . Scemaic eeseaio o oy(A sies yes I, II, a ΙΙΙ eese ee gous o gees ase o e ocaios o e oy(A sies ye I gees coai oy oe oy(A sie; ye II gees coai muie oy(A sies ocae i e as eo ae e so coo; ye III gees coai muie oy(Α sies wi a eas oe oy(Α sie ocae i e useam egio o e as eo igue ake om (ia e a 5

e seecio o aeaie oy(A sies is eiee o e eae o

o-moecua ioogica acos suc as eeomea sages a ce coiios

(Ewas-Gie e a 1997 o eame e IgM eay-cai gee swices om

usig oe oy(A sie o aoe uig ymocye mauaio (akagaki e a

199 is esus i a si i oei oucio om a memae-ou om o a

secee om ue o e eeio o a C-emia yooic egio esosie o memae ieacios is swic is a esseia se i e immue esose

owee e eaie mecaisms ioe i aeaie oyaeyaio ae si ukow Wa ae e moecua acos a eemie o eguae e seecio o

oy(A sies? Ae ey oyaeyaio oei acos? Is a sicig aco ioe? 45

5.2 Protein Factors Involved in Polyadenyiation

Si iee as-acig oei acos ae ee ieiie as ecessay o i io

ceaage a oyaeyaio (Kee 1995 (see igue 5 ey ae e Ceaage

a oyaeyaio Seciiciy aco (CS (ieο e a 1991; Gimai a

eis 1991; Muy a Maey 199 Ceaage simuaio aco (Cs

(akagaki e a 199; Gimai a eis 1991 Ceaage aco Im a IIm

(CIm CIIm (akagaki e a 199; uegsegge e a 199 oy(A oymeise

(A (akagaki e a 199 a oy(A iig oei II (A II) (Wae 1991

CS is equie o ceaage a oyaeyaio a i coais ou

suuis (3 73 1 1 ka Amog ese CS-Ι is o e oy(Α siga

AAUAAA o is aias a ieacs wi e suui o Cs a CS-73 is

ioe i isoe-e-mA ocessig (omiski e a 5 A is aso

equie o o ceaage a oyaeyaio iiiaig e aiio o a oy(A

ai o e 3 e o e-mA (Muy a Maey 199 AII is o e

gowig oy(Α ai a acs as a eogaio aco eguaig is uimae eg

a simuaig mauaio o e A (Magus e a 3

Cs is comose o ee suuis (5 77 ka amey Cs-5 a

ieacs wi A oymeise II C-emius omai (C (McCacke e a

1997 Cs- a iecy is o U/GU-ic eemes o ascis coaiig

oy(Α sigas (ee Caaias a aai 3 a Cs-77 a as ee sow

o ieac wi seea acos ioe i ceaage a oyaeyaio o eame

Cs-77 ieacs wi Cs-5 (akagaki a Maey 199 Cs- (ao e a

CS-1 (Muy a Maey 1995 a A oymeise II C

(McCacke e a 1997 I aso ca ieac wi ise (akagaki a Maey

A scemaic esciio o e oy(Α egio icuig cis-eemes a some

oei acos suc as CS Cs a A oy II (oymeise II is sow i

igue 5 Scemaic eeseaio o e ses ioe i e mammaia e-ιιΝΑ 3 ocess ie oei comees ae ioe i e oyaeyaio ocess ceaage a oyaeyaio seciiciy aco (CS ceaage simuaio aco (Cs ceaage aco I II (CI CII a oyaeyaio iig oei II (ΡΑΒ II igue ake om (Kee 1995 4

igue . oei acos ioe i oyaeyaio C e c-emius omai CS e ceaage a oyaeyaio seciiciy aco icues e ou suuis 1k 73k 3k a 1k a Cs ceaage simuaio aco icues e ee suuis 5k 77k a k

CIm a CIIm ae equie o e ceaage eacio oy u ey ae ess we caaceie a e oe ceaage-oyaeyaio acos (uegsegge e a

199; uegsegge e a 199 CI m as ee ou o e e aco a ecuis e

CS a A o i wi e e-mA i e sequece oes o coai a caoica AS (ekaaama e a 5 CIIm as ee sow o ige wo oe ceaage acos (e ies e a

I aiio o e si mai acos aoe oe oeis aso ae ee sow

o e ioe i ceaage a oyaeyaio icuig C o A oymeise

II (ak e a ; Kaeko a Maey 5 Symeki (akagaki a Maey

C (Cao a Maey 1 (eai e a 1 /

(Ai e a Ssu7 (Seime a ow 3 UΑ5 (Mieoi e a

U 1 s-A (u e a 199; iis e a S (ou e a 199 a i (oes e a Some o ese acos ae ioe i o

oyaeyaio a asciio wic suos e oio a ese ocesses ae

igy coue ogee (Aamso e a 5 A o ese eieces sugges a

oyaeyaio ca e eguae y may acos a is igy coue wi oe

ocesses 48

. oose Mecaisms o Aeaie oyaeyaio

We kow a ye II gees a ye III gees ae muie oy(A sies e swic ewee aeaie oy(Α sies may e iee o ese wo iee yes

o ye II gees oy(Α sies ae a ocae i e 3-mos eo usuay i 3 U

eeoe someimes ey ae cae aem oy(Α sies ee is o aiioa sicig uo eacig e as eo Aso e isace ewee ese iee

oy(Α sies is usuay o og a mos e eg o e as eo o ye IΙI gees e swic ewee 3U a 3 mus ski some eo(s o a o a eo

us sicig a some ocaio is ey imoa o is ki o swic owee e

eaie eaiosi ewee sicig a oyaeyaio is o we uesoo I

e oowig some yoeica mecaisms ae oose a iesigae o e swicig o oy(Α sies o ye II a III gees

.. ye II Gee oy(Α Sie Swicig

o swicig ewee oy(Α sies i ye II gees e aeaie ascis ae

e same useam eos ece e as eo (see igue 5 e as eos ae

iee oy a e 3 e ue o e usage o iee oy(Α sies ee wo

oy(Α sies A a ΡΑ ae iesigae Sice e isace ewee ese wo sies is o og e wo sies ae a equa cace o e seece ie isace wi

o aec e seecio o iee oy(A sies us A as o aaage o e seece ee oug e egio aou A 1 as ee ascie is

igue .4 ye II gee oy(Α sie swicig A a ΡΑ eese iee oy(Α sies i e as eo 49

e oowig ou mecaisms M 1-M ae oose yoeicay a

eoeicay ase o e eisig ieaue

M1: A useam oy(Α sie as a ige ioiy a a owseam oy(Α sie a e soge e oy(Α sie is i as a ige oaiiy o eig seece

e seg o a oy(Α sie is geeay eemie y e usage o iee

oy(Α sigas e GU/U-ic eeme o e owseam egio o a oy(Α sie a some oe ukow cis-eemes I aso ca e esimae om e oya_sm

ogam a was iscusse i Cae 3 e smae e oya_sm scoe is e geae cace a e oy(Α sie wi e use

I ees sime ius ye 1 (S e tk gee eee o use e useam oy(Α sie (eome a Coe 19 accoig o e oowig wo acs iceasig e ume o oy(Α sigas 3 o e k-coig egio i o aec e

oa amou o oyaeyge A ouce; iceasig e isace ewee wo sigas cause a icease i e use o e 5 siga a a ecease i e use o e

3 siga is iicaes a iceasig e ume o oy(Α sigas may o icease e seg o e oy(Α sie a so AS e mai oy(A sie

eemia is o e uique aco o eemie e seg o e oy(Α sie e

ia mA i mouse ce uses e useam oy(Α sies imes as oe as

e owseam oy(Α sies (Aews a iMaio 1993 We ieica aem

ia sigas ae seaae y ewe a ey comee o oyaeyaio a e useam sie is aways cose eeeiay (a e a 199

I uma acue ymocyic eukemia ce e c-myc gee uses e

owseam oy(Α sie imes moe oe a e useam oy(Α sie

(ai-Oiga e a 199 o es i e owseam oy(Α sie is soge a

e useam oy(Α sies o gee c-myc, e 5 oy(Α egios wee oaie 0

om oyλ_Β (ee e a 7 a eice y e oya_sm ogam e

oya_sm scoe is 5 o e useam oy(Α sie a 133 o e owseam

oya(A sie is coime a e owseam oy(Α sie is soge a e useam oy(A sie Gee cox-2 uses e isa soge oy(Α sie moe oe

a e oima weake oya(A sie a e aeaie oyaeyaio is aso

issue-seciic (a-oga e a 5 Usig e same meo aie o c-myc, we oaie a oya_sm scoe o geae a o e useam oy(A sie a a scoe o 393 o e owseam oy(A sie o gee cox-2. ese ow eames coime a e segs o oy(Α sies wou coiue o e seecio o ose

oy(A sies is aso agees wi igue 39 i e Cae 3

M2: The upstream or downstream regions of the poly(A) site may contain some protein binding sequences, excluding the sequences necessary for polyadenylation factors.

I e useam o owseam egios o e oy(A sie coais some seciic sequeces wic ae e iig sies o some oeis o oyaeyaio

acos e seecio o is oy(A sie wou e moe comicae Oce e sequece aou e oy(Α sie is ou y some oei i may ee e

iig o e oyaeyaio acos a us owe e usage o e oy(Α sie

o eame i e sequece is ey ea AS e iig o e oei may ee

CS om iig o e AS is meas is oy(Α sie may e skie a e

oowig sie may e use

oyyimiie ac iig oei ( was sow o mouae e eiciecy o oyaeyaio y iig o a owseam eeme o e oy(A sie

(Caseo-aco e a eey iiiig e oyaeyaio a a sie a makig e usage o e oowig oy(A sie ossie e osoia se-ea 51

oei meiaes e oyaeyaio swicig i e emae gem-ie (Gawae e

a y iig o e muie S-iig sies wic icue e GU-ic

oy(Α eace us comeig o e iig o Cs- i io I is case e

seecio o a oy(Α sie o oy ees o e sequece i e oy(Α egio u aso o e aaiaiiy o some oeis a e A-oei ieacios e aeoius MU ecoes ie coiea mA amiies wee e useam oy(Α sie is eomiay use i e eay sage a owseam oy(Α sie is use i

e ae sage (eao a Imeiae 199 is may e cause y e asece o some oeis iig o e sequece aou e useam oy(Α sie i e eay sage

O e oe a i some oe oeis i o e egio ea o AS o o

e owseam U/GU-ic egio i may ecui e oyaeyaio acos suc as

CS a Cs moe eiciey a eace e usage o e oy(Α sie ea e

iig sies o e oe acos e U3 sequece as ee emosae o omoe

e ieacio o CS wi e coe oy(Α sie i eiiuses (Gaeey a

Gimai 199

M3: Secoay sucue coomaio o e-mA aou e oy(Α sie may iuece e seecio o oy(Α sies

I as ee ou a eessio o e 5 oy(Α siga a uiiaio o e

3 oy(Α siga occus i I-1 ey may e cause y e aii oo sucue aou e AS a eeoe iiiig e iig o e oyaeyaio acos

CS (Kases e a 1999; Gee e a e secoay sucue ea e

owseam U/GU-ic egio aso mig aec e iig o oyaeyaio aco

Cs o ou kowege ee is o eiece eoe o a 52

M4: The concentration of some polyadenylation factors will affect the poly(A) site selection.

I as ee sow a e aeaie oyaeyaio is sogy iuece

y iceasig e coceaio o Cs- esuig i a icease i e seecio o a weake omoe-oima oy(Α sie oe a soge omoe-isa oy(Α sie

(akagaki a Maey 199; Se e a 5 We o o kow o ay oe

oyaeyaio acos a ae ee osee o aec e sie seecio y cagig e coceaio

5.3.2 Type III Gene Poly(A) Sites Switching between 3U and 3D?

o e ye III gee oy(Α sie swicig e mecaism may e a ie iee comae wi a o ye II gees sice ee is o sicig ioe i e ye II gee oy(Α sie seecio owee some mecaisms escie o ye II gees may aso e ue o ye III gees o isace i some oe oeis i o e sicig oei iig sies a ock e iig o sicig aco sowig ow

e omaio o e sicig come e oowig oy(Α sie A (see igue 55 may e use I some oe oeis i o e useam o owseam o

oyaeyaio sies imoe e iig o oyaeyaio acos y

oei-oei ieacio acceeaig e omaio o e oyaeyaio come

e oy(Α sie A (see igue 55 may e use

I some oe oeis i o e useam o owseam o e sicig

iig sies a aciiae e iig o e sicig aco eeoe acceeaig e

omaio o e sicig come e sicig sie (see i igue 55 may e use

Aso i some oe oeis i o CS o Cs iig sie sow ow e iig o oyaeyaio acos e sicig sie may e use 53

I e wo oy(Α sies ae ey cose i ye III gees e mecaisms

escie i ye II gees may aso e ue o ye III gees I e wo sies ae

seaae a aa e e sicig ocess may comee wi e ceaage a

oyaeyaio ocesses a so aec e eemiaio o e oy(Α sie I e sicig sie (SS as ee ocesse is e is asci (e o oe i igue

55 wi e geeae; i e sice sie was skie o sicig (S aes e

e seco asci (e oom oe i igue 55 wi e geeae e wo mecaisms Μ5- ae oose

Figure 5.5 ye III gee oy(Α sie swicig SS sicig sies; S o sicig; A a ΡΑ e wo oy(Α sies o e gee A is a ioic oy(Α sie

M5: The strength of the splicing site (SS) and the poly(A) site determine which one will be used first.

e seg o e 5 sicig sie is eemie y e 5 ouay o oo sie o ios coaiig e caoica i-uceoie GU a e sequece comosiio aou e oo sie suc as e eoic sicig eaces (Caegi e a e aeaie sicig is aoe iicu oic owaays a e eaie mecaisms ae ess kow I is yoesie a i e SS is ey sog e is

asci wi e geeae Oewise e seco asci wi e geeae (See

igue 55

e ime ieece o e omaio o sicig come a oyaeyaio come may ea o a iee oy(Α sie I a sicig come oms ey as

e sequece (io coaiig oy(Α sie wi e sice ou a so e is 54

asci wi e geeae I a sicig come oms ey sowy suc a e

oyaeyaio come comees e omaio eoe e comeio o sicig

come e oyaeyaio wi occu e sicig come wi egae a e

seco asci wi e geeae e omaio ime is assume o e coeae

wi e seg o e sie

M6: RNAP 's moving speed may be related to polyadenylation, such as

transcription elongation, pausing, arrest, and termination.

asciio emiaio as ee eiee o e coue wi

oyaeyaio (uaowski 5 u e oe o asciio emiaio a

oyaeyaio is si ukow I emiaio occus is e e oy(Α sie

cose o e emiaio sie may e use I oyaeyaio occus is i may

aciiae e emiaio wiou coiuig o e seecio o a oy(Α sie Aso

e eecs i e asciioa eogaio aco ae ee sow o eace e

uiiaio o a useam oy(Α sie (Cui a eis 3 I is eiee a e

eecs cause e icease o asciio ausig a aes aciiae e

asciio emiaio a ece eici e oyaeyaio

.4 oose Moe o Aeaie oyaeyaio o ye IIΣ Gees

ase o e aoe oose mecaisms i is oious a e seecio o oy(A sies is a ey come ocess a ieeces eis amog iee gees a

iee secies o e aeaie oyaeyaio o ye III gees ee ae ee

oei comees ioe A oymeise II come wic is e asciioa maciey; e siceosome come wic is e sicig maciey; a e

oyaeyaio come wic is e ceaage a oyaeyaio maciey o mos messege A 3 e omaio A sime iea moe ca e oose o

ye III gee wi comosie eo (see igue 55 ase o e iea a e oe o

oei come omaio eemies e usage o e oy(Α sie i e sicesome come oms is e is asci wi e geeae; i e oyaeyaio come oms is e seco asci wi e geeae

Ae e asciioa iiiaio A moes aog e A emae sequece a e-mA asci is ouce We e A asses e 5 SS

e sicig come egis o om e ime eee o e comeio o e come may ee o e seg o e SS e aaiaiiy o sicig acos a oe ukow acos We e A asses e oy(Α sie e oyaeyaio come egis o om e ime eee o e comeio o e come may

ee o e seg o oy(Α sie e aaiaiiy o oyaeyaio acos a oe ukow acos e eogaio see o A wi aso aec e imig oe o se u e moe some aamees ee o e eie suc as e aeage

ime o sicig a oyaeyaio come omaio a e A moig see

e oea eacio ae o asciio o aceia A oymeise is —

/seco a 37 °C a is is aou e same as e ae o asaio (15 amio aci/seco u sowe a e ae o A eicaio ae ( /sec (ewi

3 I io e see may e owe ie aou -5 /s (Koea e a

e ime o assemy o e ceaage a oyaeyaio aaaus is aou

1 secos i io a is ase o a sog oy(Α sie a o a weak oe (Cao e a 1999 e easo wy oyaeyaio assemy oms so as may e a e ceaage a oyaeyaio ocesses ee A C (ak e a ; Kaeko a Maey 5 a e A ca oy say aou e oy(A sie o aou 1 secos (wii 5 o e see 5 /s owee e assemy ime may ee 6 o e oyaeyaio acos suc as CS Cs a C ec sice i as ee sow a e coceaio o Cs- wi aec e seecio o oy(Α sies

(Se e a 5 e ime is e a ucio o may acos a is Τ p = (ρι

ρ wee ; ae e oyaeyaio acos a e seg o e oy(Α sie

e omaio o e siceosome assemy wi ake seea miues i yeas

(uy 1997 Aso e ime may ee o e sicig acos a is Τ S = s(si s

wee s; ae e sicig acos a e seg o e sicig sies Assume e

ime o e A aeig uceoies is Τ„ wic is a ucio o e isaces

om e sicig sies a e oy(Α sies a oe acos a aec e see o

A „ = ( 1 2 wee ή ae some ukow acos icuig e coceaio o s (iu a Aes 1995; ee e a Aso e see o

A aies wi iee gees ue o e iee comosiios o e sequece

igy acie gees (eg A o ea sock gees may ae ige eogaio

aes (Giaia a is 1993; Coo e a 1995 ee is some eiece sow a

e isace ewee e SS a oy(Α sie (S wi aec e seecio o iee

oy(Α sies (ei e a 199; Qiu a ie

ase o a ese eiece a e yoesis we oose a i „ + p

S e sicig occus is oewise ceaage a oyaeyaio occu is (see

igue 5

A

igue .6 Comeiio ewee a SS a a S SS sicig sie; S oy(A sie; e see o e A oymeise II (A; e isace ewee SS a S

is yoesis is ey iicu o es ue o may ukow acos suc as

e A eogaio see e sicesome oyaeyaio come omaio

imes a e aaiaiiy a coceaio o oei acos a wou aec e sicig a oyaeyaio

Is ee ay coeaio amog e isaces (IS ewee sicig sies a

oy(Α sies e segs o e 5 SS a e segs o S? e scoes eecig

e segs o ese sies wee geeae (iae aa oaie om ias a a ey wee assume o e coeae wi e come omaio ime e 5 sicig sie scoes (SC5 age i (-17 11 a oyaeyaio sie scoes

(As age i (-75 9 e As ee ae geeae ase o SSM oy a

ey ae a ie iee om oya_sm I is assume a e smae e scoe e

oge e ime eee o e omaio o e come We oice a ee is o oious coeaio ewee em om e o o IS s e scoes (see igue 57

is is o we a eece a i e isaces ewee SS a S ae age e A scoe sou aso e age a e 5 sicig sie scoe sou e sma suc a

ee is some comeiio ewee e sicig a oyaeyaio ee sou eis some oe acos

e we ook e A moig see io accou Geeay sicig akes

oge a oyaeyaio A assumio was mae aou e ime ase o e eisig ieaues e SS wi e miima SC5 ees secos o iis e assemy o a sicesome come a e SS wi e maimum SC5 ees 5 secos o iis e assemy; e S wi e miima As ees 1 secos o

iis e assemy o e oyaeyaio come a e maimum As ees 1 secos o iis e assemy We aso assume a e eaiosi ewee e scoes a e imes wee iea a e A see was 5 /s 5

igue 57 Scae os o SC5 a As s IS SC5 sicig sies scoe; As oyaeyaio sie scoe; IS isace ewee a sicig sie a a oy(Α sie

ase o e aoe assumios a o is mae a sow i igue 5 (a

Eac oi i e o eeses oe gee ase o e scoes o sicig sies a

oyaeyaio sies e -ais is e esimae ime o e omaio o a sicig come a e y-ais is e esimae ime o e omaio o a oyaeyaio come us e ime o A aeig om SS o S I ey ae isiue aog

e iagoa ie a meas o oy(Α sies ae caces o e use e maoiies ae ocae aou e oe cee wic is e same as eece u some ae si ocae eicay (see igue 5 (a wic meas a ee i e isace is

ey age o e oy(Α sie is ey weak e owseam weak oy(Α sie is si seece is imies a e seecio o a oy(Α sie is o oy eemie y e seg o e sie u aso may oe ukow acos e A see mig e

ey ig o some age ios o e sequece aou e 5 sicig sie a e

oy(Α sie coais some iig sequeces o oe acos I we eee gees wi og isaces o we imi e ime o oyaeyaio a A aeig o e wii s e o sow i igue 5 ( eecs e comeiio

igue .8 oyaeyaio assemy ime + A aeig ime s sicesome assemy ime Eac oi eeses oe gee wi a ioic oy(A sie e uis o e - a y-aes ae i secos (a A ossie gees ( Gees wi imes o oyaeyaio assemy a A aeig ess a secos

ue o e age ume o acos coiuig o e seecio o iee

oy(A sies i is ey ecessay o ocus o some seciic gee o uesa aeaie oyaeyaio Maemaica moeig is a sog oo o suy e

yamics o e seecio o iee oyaeyaio sies a is eas e as a o is wok CHAPTER 6

MATHEMATICAL MODELING OF CSTF-77 ALTERNATIVE POLYADENYLATION

6.1 Introduction to CstF-77

ο uesa aeaie oyaeyaio some ossie mecaisms eig

oose ae ee escie i e eious secio e mecaisms coceig

e seecio o iee oy(Α sies ae ey come i geea a iee gees may use iee aways eeoe we wa o ocus o some secia gees

o suy e eguaio o wo iee ascis ouce y aeaie

oyaeyaio Gee cstf-77 (someimes cae csf3), wose oei ouc is oe o e suuis o Cs was cose o is suy is gee was seece o wo

easos e gee ouc is a oyaeyaio aco a is gee ca ouce moe

a oe asci ue o aeaie oyaeyaio We eiee a is ki o gee o oy ca coo e eiciecy o oyaeyaio y aeaie

oyaeyaio u aso ca om a eeack oo y auo-eguaio

e aeaie asci o uma cstf-77 ue o aeaie oyaeyaio

as ee ou ecey (a e a Moeoe i as ee ou a e eessio ees o e wo ascis ae oosie o eac oe ase o e oseaio o e SAGE aa (see igue 1 A ioic oy(Α sie as ee

ou o csΐ-77 a is eas o e eaiey so Cs-77 asci (eee as

Cs-77S (see igue e so om oei acks e ucio o a oma

om (e oma Cs-77 asci is eee o as Cs-77 e og asci as a suui o Cs (a e a a is eaie ucio is si ukow

60 6

igue 6. SAGE iay aa o Cs-77 a Cs-77S Eac o e 9 eica ies eeses a iee iay e o ow eeses e eessio ees o e og om o Cs-77 a e oom ow eeses e eessio ees o e so om o Cs-77 igue ake om (a ea

igue 6.2 asci sucue o gee cstf- 77. e use o a useam oy(Α sie geeaes e Cs-77S a e use o a owseam oy(Α sie geeaes e Cs-77 igue ake om (a e a

A ume o gees ae ee ou o coai muie ascis wi oosie eessio ees om Cae Some o e ascis ae geeae ue

o aeaie oyaeyaio Oes ae ue o aeaie sicig Is i eay ue

a e og a so om asci o Cs-77 ae oosie eessio ees? I

is is ue e ow oes is ae? Wa coos is ye o yamica eessio i a issue-seciic way? I ese wee uesoo i wou e us o uesa e mecaism o aeaie oyaeyaio

ee is o easo o assume a ey ieac iecy wi eac oe y

A-A ieacio Oe ossie way o e eguaio is a e oei

oucs o e og a so om ae iee eecs o e asciio 62

eiciecy o geeaig e og a so om ascis a us om a

auo-eguaio ewok We kow a e og om oei is a geea

oyaeyaio aco a is equie o oyaeyaio owee e ucio o

e so om oei is ukow

e coceaio o Cs- as ee sow o eguae gee eessio

(akagaki a Maey 199; Se e a 5 u o ou kowege ee is o

eiece o Cs-77 a oe oyaeyaio acos o eguae gee eessio

o se u e moe some assumios wi e mae o e ucio o e so

om oei

6.2 oose Moe o Aeaie oyaeyaio o Cs

e uma Cs-77 so om asci was is escie i (a e a a

o aiioa ieaue as ee ou eae o i is asci is geeae y e

ioic oyaeyaio sie wic as ee suie wiey (ou e a 199; uce

e a 3; ia e a 7 owee e eguaio ewee e wo ascis

om e same gee as seom ee suie ee ae a ume o easos o e

ack o is ki o suy is o a o e oei oucs om iee

ascis o e same gee ae we uesoo Seco a age ume o

aeaie ascis ae eesse a ey ow ees a e cue ecoogy ca

o eec ese ascis i e eguaio ewee wo iee ascis

om e same gee seom aws oo muc aeio ou oeia oucs om

iee ascis may o eguae e eessio o e same gee

ee we iouce a ew maemaica moe o iscoe a uesa e

eguaio o wo iee ascis geeae om e same gee ue o aeaie

oyaeyaio We e eemie e cosequeces o e yoeses i e moe 6

ese wo ascis ae e same asciio iiiaio sie a e 5 e a ey

ae e same 5 u-asae egio (U owee e oyaeyaio sies a

e 3 e ae iee wi oe ocae useam o e oe e wo oy(A sies

ae cae e useam oy(Α sie (UA a e owseam oy(Α sie (A

I e UA is use e e A wou o e use o is asciio iiiaio

O e oe a i e A was skie o some easo e e A wou e

use e eaie mecaism o we a ow e useam oy(Α sie is use is

o cea a some ossie mecaisms wee iscusse i Cae 5

e eicie usage o e oy(Α sie is assume o ee o e oei

coceaios o e oyaeyaio acos Some oyaeyaio acos suc as

Cs- (Se e a 5 ae ee ou o aec e oyaeyaio eiciecy

owee i is suy we assume e coceaio o Cs- is e same o a e

ime a is e Cs-77 og om oei is assume o e e oy aco a

wou aec e oyaeyaio eiciecy e Cs-77 og om oei is

caaceie as oe suui o Cs wic is ecessay o e ceaage o e

e-mA e ucio o Cs-77 so om oei is o kow ee e so

om oei eie as o ucio o i may ae a iiioy eec o e ceaage

ase o e sucue aaysis i (a e a Moeoe i i as e same

ucio as e og om oei e i ca e eae as a og om oei a e

sysem wou ecome a osiie eeack sysem i oe aiae us ee is o

eguaio y seig u e moe a usig simuaios i is ossie o euce a

uaie ucio o e so om oei

Some wok as ee oe o suy e asciio iiiaio ae o some

seciic gees o eame e AA o coaiig e omoe egio wi aec

e iig o e AA iig oei ( eeoe aecig e asciio iiiaio ae (Aoiou e a 1995; ooes e a 199 e sucue o A a

o-coig A aso as ee imie i asciio iiiaio (Youg e a ;

OGoma e a o ou suy we assume e asciio iiiaio ae is cosa o gee cstf-77.

e eguaio ewok ca e escie as a seies o ees (see igue 3 we e asciio sas a e asciio iiiaio sie (IS e A aes aog e A emae sequece a e e-mA is ome; we e UA is eose ceaage a oyaeyaio may ae a a ime a e Cs-77S is

ome; i e UA is o use o some easo e A coiues o moe a e

A wi e eose; e Cs-77 wi e geeae y usig e A; e mA

ascis wi e eoe o cyoasm wee ey ae asae io oeis; ae

e oig a some moiicaio ocesses e oeis wi e e asoe

ack io e uceus o ac as oyaeyaio acos a e ey wou aec e

asciio eiciecy y oyaeyaio

igue 3 Aeaie oyaeyaio o Cs-77 IS asciio iiiaio sie; UA useam oy(Α sie; A owseam oy(Α sie Cs-77S a Cs-77 eese e so a og om mA ascis CS-77S a CS-77 eese e so a og om oeis Cs-77S is geeae y usig UA a Cs-77 is geeae y usig A 6

6. Maemaica esciio

A-oei ieacio ca e iewe as a susae-eyme ieacio a e

Micaeis-Mee equaio as ee use o escie is ki o kieics (a e a

3 Goowi (Goowi 195; Goowi 19 osuae a osciaoy moe o simuae e yamics o mA a oei ieacios a e yamica eaio o is moe was aaye y S Gii (Gii 19(a; Gii 19( Sice

e oy(Α sie o e so om is ocae useam o e og om is iceases

e oouiy o e so om o e ome is is aso ca e see om e

oose mecaism M1 om Cae 5 us i is aua o assume a e

asciio aes o e og a so oms ae e oowig eeseaios usig e Goowi osciao

(1

wee a Ρ ae e asciio aes o e og a so oms eseciey α is e maimum asciio ae o e so om L a S ae e eaie oei coceaios o e og a so oms eseciey 1ο is e asciio iiiaio ae wic is assume o e cosa K is e equiiium cosa n is e

i coeicie a ε is e eaie iiiio cosa o e so om ee we

assume a Iο = 1 + Ρ , wic meas a o ay asciio iiiaio eie e

og om o so om is ouce I ε = e e so om oei as o eec o oyaeyaio

As we kow i eukayoes we a gee is ue o a messege A wi

e mae ocesse a asoe om e uceus o e cyoasm eoe e

oei is geeae O e oe a we a gee is ue o e oei 66

oucio wi o e aece immeiaey ui e A a aeay as ee

ascie ecays e eay ime o e omaio o a ucioa oei om a maue mA moecua as eceie some aeio (Scee e a 1999; Coe e a 5 e ime eays ae iee o iee gees e coeaio ewee mA a oei auace as ee wiey suie i oe secies (Geeaum e a 3; ie e a owee e coeaio oes o aea o e we

eie ecause o oise a oe acos

e oei coceaio is moe iicu o measue a mA eessio

ee eeoe e aiaes i e moe wi e cose o e e eessio ees o e mA ascis We assume a e oei ee a e cue ime is

ooioa o e mA ee a as ee ascie a a eious ime us we

— ae e eaiosis L(t) 1(t- 7) a S( - s(t- ro), wee is e ime aiae 1(t) a s( ae e eaie mA eessio ees o e og a so om a ime

τ is e ime uaio ewee e omaio o e mA a e ucioa oei wic icues e ime o iA asoaio om uceus o cyoasm

asaio oei oig oei eocaio ec Aso e ime eay is assume o

e e same o o e og a so om asci

ase o e mass aace change of expression level = production rate — degradation rate, e yamics o e og a so om ascis is moee y

e oowig ieeia-eay equaios

(

wee a s eoe e egaaio aes o e og a so om ascis

eseciey I is aso ossie a o some asciio iiiaio o oy(Α 6 sies ae skie a a ee is o asci geeae a is someimes is cae

asciio aoio (Sousa e a 199; oca a aci 3 e oaiiy

a e A is use wi e assume o ee o e coceaio o e

oyaeyaio acos caaceie y e i equaio wi a iee equiiium cosa a use y UA y sigy moiyig e is equaio o

Equaio ( we oai a moe comee u moe comicae sysem

(3

6.4 eoeica Aaysis o ieeiaeay Equaio

e ieeia-eay equaios (Es ae iicu o aaye maemaicay i geea Α E (aso cae a eae ucioa ieeia equaio E is a secia ye o ucioa ieeia equaio I is ey simia o a oiay

ieeia equaio (OE u is souio ioes as aues o e sae aiae

us e souio o a E equies kowege o e cue sae as we as e sae a a ceai eaie ime eeoe e E is a iiie-imesioa sysem so we ee o seciy a iiia ucio o a iiia ime iea

o uesa e yamica eaio o Equaio (3 a sime E wi

e cosiee K, = a € = mea a e so om oei as o eguaoy

ucio a a i geeaes eie a og o so om asci ae e

asciio as ee iiiae I is case Equaio (3 is euce o e oowig sime oe 68

(

oe a e is equaio is ieee o e seco equaio so e aoe sysem ca e suie y e oowig sige E

(5

Equaio (5 as si aamees (Io α K τ a i ca e euce o ie aamees y e asomaio

1( _ 1( K

e esus is

oig e - o 1(t), we oai

(

I Ii = 0, e is ecomes e same equaio as a as ee suie y Mackey a

Gass (Mackey a Gass 1977 o wic ey oaie some esus I ei eo

e oowig wo iee equaios wee cosiee

(7

( 6

wee = ( — τ β ae osiie cosas ey caime a e souio o

Equaio (7 wi eie aoac a ie oi o sow a sae imi cyce

osciaio wie e souio o Equaio ( ca e caoic

I e oowig Equaio (5 wi e suie i moe eai a some

cocusios wi e mae (see aso (iekma 1995 wo quesios ca e

aesse aou Equaio (5 oes Equaio (5 ae a eioic souio a

wa is e eaiosi ewee e eio a e eay?

Equaio (5 as oy oe seay sae osiie souio 1 wic is

ieee o e ime eay I we se e ig-a sie o Equaio (5 o eo wi eo eay we oai

e equaio ca e wie i aoe om

e aoe equaio ca ae oy oe osiie souio sice e ie o e

e sie a e cue o e ig sie o e aoe equaio ca iesec oy a oe

osiie oi Ιο. The saiiy o is seay sae souio ca e ou y ieaiig

e o-iea em aou e seay sae e saiiy aaysis is muc moe come wee ee is oeo ime eay Ae ieaiaio o Equaio (5 e

oowig iea E is oaie

(9 wee £(t) is e euaio aou e seay sae souio o a 0

Ae scaig e ime y seig = τot , we ge e oowig iea E oig

e o (

(1

We ook o e souio i e om ( = eZ` . Ae susiuio we oai

e caaceisic equaio o Equaio (1

(11

Sice is a ukow ume wie = μ + i a susiue is io e aoe caaceisic equaio we oai e oowig aameeie equaios

(1

I μ = e Equaio (11 as e ue imagiay souio a Equaio (1

as a osciaoy souio y seig μ = we ge e oowig equaios

(13

e aoe equaios ae ee i v, so we ca esic v _ e aoe eessio as siguaiies a = kπ k = 1 a we ca iie e ig a ais o u io ieas suc a sie ucio as a sige sig i eac iea

eie e ieas 71

a eme e cues Ck i e ( a β -ae aameeńe y u wic is ayig wii e iea Ik

(1

We ca o a e cues o Equaio (13 i e same (a β -ae y

ayig e aamee a we ge e aamee ase ae (see igue

igue 6.4 aamee ase ae o e iea E. a a β ae e coeicies i Equaio (11 ;± is e egio oue y soi cues a c; ± is e soi cue geeae y Equaio (13 e umes gie ae e coo eese e ume o oos o Equaio (11 ocae i e ig a ae

Some oeies ae o o e aoe cues 72

Lemma 6.1 o eac cue Ck as we ay e u om e e e o e ig e o e iea Ιk α is eceasig β kees e same sig a e eeme aue o β is a a=1

u cos oo om a(υ = akig a eiaie wi esec o we ge si

Sice si ν — ν S o a u 5 α is eceasig β = — oes o si υ cage e sig we cages wii e iea ± β o e cue Ck a β o e cue C Aso

We si — cos = we ae α =1 Ί

Lemma 6.2 A e cues Ck eie i Equaio (1 o o iesec wi eac oe a ey ae asymoic o e ies α ± β =

oo e cues Ck a Ck o o iees wi eac oe sice β o e cue Ck a β o e cue Ck . o oe a ee ae o iesecios amog e cues Ck o C we ca use coaicio I ee eis wo isic

cues iesecig wi eac oe e om Equaio (1 Ξ νk Ε 1k E Ι

k we ae e oowig 73

a is

e we ae uk = ν 1 wic coaice wi e assumio a k a v

eog o iee ieas a ese ieas ee iesec Sice we v —» kπ - ,

ucosv + υ so e cues Ck ae asymoic o e ies α ± β = ❑ si v si

om emmas 1- we kow a o eac oi (α β yig o e cue

C ee eiss a uique ν Ε Ιk a e souio o Equaio (11 wou e

= ±ii

I oe o eemie e ume o oos i e ig-a ae i oe

egios oe a o e cues we ee o i υ a νo a we ae μ -cue

(15

ι 1 l e age eco o e μ -cue is gie y ν = υ a να

s υο

We ae o comue e ie ouc o ν a a esigae oma eco

oCk e aows aog e soi cue Ck (see igue iicae e iecio 7

o u iceasig I l = ( q is a age eco o C e ν1± = (—q is e

oma eco Α eco = ( s ois o e e o e cue comae wi e

1 aow iecio i a oy i •ν1 e age eco o e cue Ck is gie y

e oma eco is gie y

So wee e eco ois o e e o e ig ees o e sig o

Is ue a

o ν sice ( = a ( = ν(1— cos u o VEIk±. e sig o

1 ν •1 oes cage aog e cue Ck sice si ν oes cage e sig aog e cue a is i we i a uo o e cue Ck o CO3 a icease

1 e μ om aoe we kοω a y •1 a so e μ -cue wi go o e e o e cue Usig e same agumes i we i a ν o e cue Ck a

1 icease e μ om aoe we kοω a •1 a so e μ -cue wi go o

e ig o e cue 75

We ca cocue a we moig away om Ck o Co o e e o moig away om Ck o e ig e ciica oos ocae o e ue imagiay ais moe io e ig a-ae

Lemma 6.3 The μ -cue wi iesec eac o e Ck o Ck cues i a eas oe oi we we ay μ om —α o

Proof. om Equaio (15 e μ -curve is gie y e oowig eoeia

ucio

(1

i ca e see a f(u) is a coiuous ucio i e iea Ik ow ook a e

imi o e ucio f(v) as υ aoaces e e ois o e ieas

ν 1) m f(v) = lim (— e si = 11I11 (— esiι ν = im (χe- = —α ν- (k-1π- υ- (k-1π- Si υ-(k-1π- si -±-α a is we υ Ε 1k = ((k — Ιπ kπ ( akes e age om Ο o coiuousy; we υ Ε Ιk = (kπ (k + 1π ( akes e age om o Ο

coiuousy Sice e curve Ck± aoac e iagoa ie e eoeia

ucio mus iesec wi C 6

om aoe we ca see a o some aue k E Ιk we ca i e

μ -cue om Equaio (1 ee mus eis a iee aue ν Ε 17 suc a

e μ -cue coesoig o is eacy e same wi e μ -cue coesoig o k ecause o e oeies o e ucio (u escie i

emma 3 e ume o oos i e ig-a ae cages we e aamee

ai cosses Ck± o as you oow e μ -cue o a aamee ai (a β i e oe egio k oue y e o ies a cues e ume o e oos

ocae i e ig-a ae is gie y e umes oowig e coos i e

igue I we ace ack aog e μ -cue we ge a iiie ume o oos

ocae i e e-a ae ece aog e a -ais Sice e μ -cue as o iesecio wi e a -ais e ume o oos o e a -ais ca o e coue

y is oceue e a -ais coesos o β = Ο a e caaceisic equaio

as oy oe oo a = a

I cocusio o Equaio (11 ee ae wo ue imagiay oos o

aamees o e cues Ck ; e ume o oos i e ig-a ae o

aamees o e cues Ck is e same as i e oe egio k_ 1 ; a e ume o oos i e ig-a ae o aamees o e curves Ck is e same as i e oe egio e ume o oos i e e-a ae is iiie ece aog e a -ais

As o wy e ume o oos i e ig a-ae is cosa i e egio

oue y e cues Ck a e ie a eoem escie i e ook

(iekma 1995 says e oowig

Theorem 6.1 (Coiuiy o e oos o a equaio as a ucio o aamees e

Ω e a oe se i C a coiuous come-aue ucio o III III Ω suc

a o eac (α β ι--i (α β is aayic i Ω e ω e a oe suse o

Ω wose cosue ω i C is comac a coaie i Ω e α β e suc

a o eo o (a β is o e ouay o w e ee eiss a

eigouoo U o (α β i Ili x R suc a

• o ay (α β e U (α β as o eos o e ouay o ω ;

• e ume o eos o (a β i ω akig muiiciies io accou is cosa o (α β e U Ί

us e aamees a make e iea E ae eioic souios mus e

ocae o e cues Ck om e eiiio o e aamees om Equaio

(1

we kow a a β . a is o Equaio (5 e aamee oi (a β is

ocae o e III quaa i wic e u ages i e iea

(π + kπ υ (- + kπ us we ae e oowig cocusio

Lemma 6.4 e eio o e eioic souio o Equaio (5 mus saisy

τ0 τ0

oo om aoe we kow a e eioic souio as e om e a

- e eio o e souio is = π wee π ν υ 78

so we ae T' Sice we scae e ime y e asomaio i = τot , e

eio o e oigia sysem wi eay τ 0 wi saisy τ0 T < τ0 ❑

emma gies us a way o esimae e aamee τ

6.5 Numerical Simulations

ieeia-eay equaios ae iicu o soe aayicay (Muay 199 a may esus ae ee oaie aou e iea a oiea E suc as saiiy aaysis e eisece o eioic souios e iucaio aaysis ec (eie a

Mackey 19; osso e a 1993; Mackey a ecaea 199 I a e ie

aamees Ι τ α K ε a K, ae gie we ca soe e E equaio (3 y a umeica meo

Α o o sowae ackages ae ee eeoe o soe e DDE, suc as

E-IOO wie i Maa (Egeogs e a EI wie i

C/C++ (Woo 3 ΚΑG wie i oa 77 (Cowi e a 1997 a e3

(Samie 1 wie i Maa e ae wo ae use mos oe a e3 is oe a o Maa (esio o ae so ey ae use i is suy e ackage

e3 i Maa (uig i wiows eium M 17G CU 1 G AM is sowe

a ΚΑG (uig i iu wi e same comue seigs e3 ook

37s o comue e souio wi e ime uaio o 1 o Equaio (3 wie ΚΑG oy ook 11s 3 imes ase a e3

Sice e iiia ucio (I is ey iicu o eemie ioogicay we use a cosa iiia ucio (CI i is wok I o sae oewise e CI o

e E is aways l(t) =s(=1 —τ _ t<_ 79

The solutions of Equation (6.3) could be very complex for some set of parameters. Different types of solutions are possible, such as steady-state solutions, periodic solutions, and chaotic solutions. One solution is shown in Figure 6.5 for randomly selected parameter set.

Figure 6.5 The phase plane of the long and short form transcripts. The solutions of Equation (6.3) are obtained for randomly selected parameter values given at the top of the plot. The constant initial functions for the long and short from are set equal to 1.

6.6 Parameter Estimation

Some of the nine parameters in Equation (6.3) have been chosen according to the following considerations. For the transcription initiation rate, without loss of geeaiy we se Ιo =1. The parameter a, the maximum transcription rate of the short form, needs to be less than the transcription initiation rate 1, and it is to be determined. The Hill coefficient representing the cooperative binding process is difficult to determine. In the modified Goodwin oscillator, Griffith (Griffith 1968(b)) 80

ou sae imi cyces oy wi a i coeicie o eig o moe I Scee e a

(Scee e a 1999 e i coeicie aies om 1 o ; i iog e a (iog a ee 3 ey use 5 as e i coeicie; i Goee (Goee 1995

e i coeicie is ; a i Eowi e a (Eowi a eie e i coeicie is Sice o iee aciao a iiio susaes e i coeicies ae iee we cose a i coeicie o i is suy

A iscee eay as ee iouce o suy gee eguaio Mok (Mok

3 use a asciioa eay aou 15- miues o suy e osciaoy eessio o e asciio acos es 53 a -k ewis e a (ewis e a

1997 icue a eay o o e eio oei (E o suy e cicaia

yms om e eious eoeica aaysis we kow a i e eioic souio eiss e eio o e eioic souio o e iea DDE equaio (1 is

ewee wo o ou imes e eay ime is is aso ue o Equaio (3 i e sysem as a eioic souio esuig om a aom simuaio (see igue I

as ee ou a e oei oucio aes ucuae oe a ime scae o aou oe ce cyce a ces aou o iie ouce o aeage wice as muc oei

e ui o ime as ewy iie ces (osee e a 5 is moiaes us o esimae a e eio o e oei oucio osciaio is iecy eae o e ce cyce o a yica mammaia ce e ce cyce ass aou us i is

aua o esimae e ime eay o e oe i o e ce cyce eio 81

Figure 6.6 Relationship between the delay time and the period of Equation (6.3). The parameters for these two plots are Io = 1, = α = 0.8462, Κ = 0.5252 ,

= 0.672 1, ds = 0.8381, 6. = 0.019, Κ1 = 0.2026 . The delay time is 4 for the left plot and 8 for the right plot. The period is —11 for the left plot and —23 for the right plot. The constant initial functions for the long and short from are set equal to 1.

To solve Equation (6.3), we have to estimate the remaining six parameters

α K 1 s ε K1 Generally, these parameters are difficult to estimate by biological experiments, thus, in-silicon estimation comes into play based on the experimental data. An optimization method will facilitate the estimation procedure.

Very little information is known about the magnitude of these parameters, so we assume that all of the six parameters are located between 0 and 1. To find an optimal parameter set in the six dimensional parameter space, we use a global searching method.

The parameters in the model equations can be estimated by minimizing the error between the experimental data and the simulation data. Assume that the

experimental data contains m time points at [t ... tm ] for both long and short form

mRNA expression levels, i.e., Ε = [ 1„ i ; s1 s « ] For any given set of parameters in the parameter space, the DDE system (6.3) can be solved and evaluated at these m 82

ime ois wi a CI ie Ν(α Κ l s ε Κ' = [1 1; sl sn ] e mea

m

Σ (i -/ ± (Si - S squae eo is cacuae y e =1 i=1 . We wa o i a m oimum aamee se o miimie e eo I e eo is ess a some oeace say 1 we say a e aamee se is a goo aoimaio u i ac we ca

ee aciee is oeace ecause o e oise i e eeimea aa Sice e

aamee sace is coiuous i is imossie o eauae a souio a eey sige

oi So isea we ca eauae e eo a uiomy isiue ois i e sace o aoimae e eo

I we uiomy iie eac aamee sace io ois y esimaig a

ossie comiaios o e aamee aues we ca i e goa miima eo

y soig e E equaios imes y usig e ΚAG ackage i wi ake

11 secos o eame i =5 ie we seac e sace wi a se o i wi ake a a yea o iis e seac o acceeae e comuaio e message

assig ieace (MI o aae comuig is aoe We use e ya cuse i

e eame o Maemaica Scieces a I wic was ue y a ga om

S o o e comuaio Eac oe coais AM Oeo 5 CUs a G o AM I we use oes om ya e i wi ake oe week o iis e seac wi a se o i eey aamee sace o ue acceeae e comuaio we aso assume e egaaio aes o e og a so om ae e same ie l = I is case i wi ake ous o comee e o o =5 wi

oes We ou a is was acceae a we ca use is amewok i e

uue i we ae some eeimea aa 83

6.7 Results

6.7.1 Experimental Data

e oowig aaga is oie y eua a om i ias a

ea ces wee seee a 7% couece i 1-we ae i DMEM meium suemee wi 1% S oa ceua RNAs wae eace usig e easy ki (Qiage accoig o mauacues oocos mRNAs wee eese-ascie usig oigo- imes (omega ea-ime quaiaie C was caie ou usig

e 75 ea ime C sysem (Aie iosysems wi Sye-Gee I as ye e

oowig imes wee use o eecig aious egios o e Cs-77 gee O

(5-GAGGCCAGCAGGAGAC 1 (5-GCAACCCAAAAGCAACAA 3

(5-CAAAACAAGGCAAAACC imes o Cycoii A (CY wee

5-AGGCAACCCCACCGG a 5-CCGCGCGGAACGC

ie ime oi aa wee oaie a 1 3 5 a e eaie eessio

ee is sow i igue 7

Figure 6.7 Eeimea aa a ie ime ois e -ais is e ime i ous a e y-ais is e eaie eessio ee omaie o e is ime oi a 1

6.7.2 Optimal Parameter Set

y usig e meo escie i Secio 5 a e eeime aa oie y

i ias a e oima aamee se we oaie was 84

a= 0.88, K =0.98, = s = 0.44, ε =0.02, Κ1 =0.10, which led to the smallest error of e = 0.4319 for CIF equal to 1. The simulated results and the experimental results are plotted and compared (see Figure 6.8). The simulated oscillation fits the experimental oscillation well, but not perfectly due to the noise in the experimental data and some unknown factors.

Figure 6.8 Solutions fitted with optimal parameter set. The optimal parameter set is obtained by the minimization of the error between the experimental data and the simulation data. The experimental data is shifted 100 hr to the right for better representation.

Except the obtained optimal parameter set and the smallest error, different combinations of parameters with errors less than 0.5 are saved. The distributions of these parameters are plotted (see Figure 6.9). We can see that parameters

α Κ 1 s ε are more sensitive than K for the errors. Several observations can be obtained from the numerical simulations:

• The parameter α is less than 1, but close to 1, which is what we expected since the maximum transcription rate is less than the transcription initiation rate. If the long form protein is over expressed, the cell would generate more short form transcripts, and the transcription rate of the short form approaches 85

the transcription initiation rate so that the cell can control the long form production very efficiently.

• Κ1 is less than Κ is is ecause ee ae wo oy(Α sies ocae i e as eo o geeae e og om (a e a wo oy(Α sies i e as eo ae aso ou i oe secies eg osoia meaogase a osoia iiis (Auie a Simoeig 199 a is i e og a so om ascis ae e same oouiy o oyaeyaio e oaiiy o geeaig e og om is ige a o e so om

• The parameter ε is very small, which means that the short form has a small inhibitory function or no function at all. The CstF-77.S protein has not been detected yet (private communication with Dr. Bin Tian). If expressed, from the putative protein sequence, it would only contain the first HAT domain, and lack the C-terminal region responsible for interacting with CstF-64, CstF-50, and itself (Takagaki and Manley 2000). It is not clear if it would bind to CPSF-160 or the CTD since the protein regions for these functions have yet to be mapped.

Figure 6.9 Distribution of parameters with error less than 0.5. The parameter values are given on the x-axis, and the frequency data for each parameter are given on the y-ais 1 a age om o ; ε ages om o ; Κ ι ranges from 0 o ; Κ ages om 9 o 1; a ages om 7 o 1

6.7.3 Different Initial Functions

The CIF 1( = s( = 1, —τ _ 0 is used for the unperturbed system. In fact, the initial conditions are arbitrary and are hard to determine based on the biological experiments. Most biological systems are very robust, for example, proteins can tolerate thousands of amino acid changes, metabolic networks continue to sustain life even after removal of importance chemical reactions, gene regulation networks 86

coiue o ucio ae aeaio o key gee ieacios ec (Wage 5 I

ems o ousess i meas e sysem wi coiue o ucio i e ace o

euaios

Some eeimes ae ee oe o kock ow o euce e og a so

om gee eessio ees (uuise aa om i ias a a is

coesos o e simuae moe wi iee iiia coiios Kockow is

associae wi a ecique o euce e gee eessio ees u o euce e

gee eessio ees o a ey ow ee o eo wic is cae kockou A

ieeece (Ai is a ey commo kow-ow ecique I we kock ow e

og om e e iiia coiio o / ( ees o e se o a smae aue; o e

oe a i we kock ow e so om e e iiia coiio o s( ees

o e se o a smae aue I we kock ou e og om e iiia coiio o 1(

ees o e se o a ey sma aue o eo We ie iee CIs a simuae

e cosequeia "eoye" I o seciie e CI wou e e cosa 1

I e og om is kocke ou ie e og om CI ess a 53 e

e souios go o eo (see igue 1 is imies a wiou e og om

oei e ce wi ie sice e og om oei is esseia o e ce o suie

I as ee eiie y eeime a i e Suesso o oke (Su( a

osoia omoogue o Cs-77 is kocke ou e osoia wi ie a e aae sage (Simoeig e a 199 87

Figure 6.10 Solution of Equation (6.3) with long form CIF set to be 0.0652. CIF, constant initial function. The short form CIF is set to 1.

If we just knock down the long form, i.e., the long form CIF greater than

0.0652, the cell can recover very slowly and then survive (see Figure 6.11). That means, knocking down the long form would not kill the cell, but instead the small amount of long form protein would rescue the cell by itself.

Figure 6.11 Solution of Equation (6.3) with long form CIF set to be 0.0653. CIF, constant initial function. The short form CIF is set to 1. 88

I e og om is oe eesse ie e og om CI is geae a 5 e sysem wi eu o a osciaoy souio (see igue 1 Oeeessig o kockig ow e so om oes o aec e sysem (see igue 13-1

igue 6.2 Souio o Equaio (3 wi og om CI se o e 5 CI cosa iiia ucio e so om CI is se o 1

igue 6. Souio o Equaio (3 wi so om CI se o e 1 CI cosa iiia ucio e og om CI is se o 1 89

Figure 6.14 Souio o Equaio (3 wi so om CI se o e 5 CI cosa iiia ucio e og om CI is se o 1

Aso om igue 1-1 we ca see a we e og om eessio

ee eceases e so om eessio ee aso eceases a we e og

om eessio ee iceases e so om eessio ee aso iceases i e

is seea ous is agees wi e esus a e ack o Su( ucio is coeae wi e isaeaace o e so om Su( A a accumuaio o

e so om equies e wi-ye Su( oei (Auie a Simoeig 199

6.7.4 Basin of Attraction and Basin Boundary

e se ome y a e iiia ois i e ase sace o a yamica sysem wic is aace o a gie souio (eg a ie oi a imi cyce is cae e

asi o aacio o a souio May oiea yamica sysems ossess mui-sae souios a iee iiia coiios eogig o iee asi o aacio wi e aace o e coesoig souios (osso e a 1993 e

ouaies o e aious asis o aacio ae kow as asi ouaies e 0

eeece o souio eaio o e iiia coiios i ceai is-oe

oiea Es as ee iesigae (osso e a 1993

We oice a igue 11-1 ae e same eioic souios (eee o as e

imi cyce souio o iee iiia coiios y umeica simuaios (see

igue 15 We yoesie a is is a imi cyce e imi cyce souio is

eua sae as sow y umeica simuaios wi e souio aace o e

imi cyce wi iee CIs (see igue 15

igue 6. Souios o Equaio (3 wi iee CIs CI cosa iiia ucio o iee CIs e souios go o e same imi cyce souio

om igue 1-11 we ou a ee is a iucaio oi o e og

om CI i e so om CI is 1 a is i e og om CI is geae a

5 e sysem eus o e imi cyce souio oewise i wi go o eo o ay gie so om CI iee CIs o og om ae ie a e Equaio (3 is soe ase o e CIs e CI o og om was ou suc a wi a ie

eceasig o is aue e souio goes o eo wie wi a ie iceasig o is

aue e souio goes o e imi cyce souio o iee so om CI e

iucaio ois o e og om CI ca e geeae umeicay a ey ae

oe i igue 1 e cue is cae e asi ouay seaaig e ase sace io wo egios eo egio a osciaoy egio I e CI sas i e eo

egio e souios o e Equaio (3 ae aace o e oigi I e CI sas i e osciaoy egio e souios go o e imi cyce souio

igue 6.6 asi ouay o CI CI cosa iiia ucio e ouay is ou umeicay suc a wi ay gie so om CI a ie icease o og om CI wou ea o a osciaoy souio i e osciaoy egio A ie ecease o og om CI wou ea asymoicay o e eo souio e asi ouay aou eo is a eica ie iicae i e isee o

We aso ou wo ieesig eomea aou e asi ouay we e

CIs ae iee om eo e asi ouay is amos a saig ie wi e aio

(og/so aoacig as e CI iceases; we e CIs ae aou eo

e asi ouay ecomes a eica ie (see e amiicaio o i igue 1 wic meas a i e so om CI is ey sma e o ae e imi cyce souio e og om cosa I ees o geae a 13 ioogicay is 2 meas a ee cou e o so om oei u a sma amou o og om

oei is esseia o escue e ce om ceai ea

I ac i is easy o see a eo is a ie oi o Equaio (3 y

ieaiig e sysem aou eo we ca sow a eo is a sae ie oi I

ac we Equaio (3 is ieaie aou eo e ieaie equaio as o

eay em a ecomes e oowig OEs

e equaio as seay sae souio a eo a so eo is a sae ie oi o

Equaio (3

Wy is e asi ouay a eica ie ea eo?

I e ig sie o e is equaio o Equaio (3

(17

is egaie wi s 4 e sysem wi go o eo Se (1 = a soe o 1 ea e

eo I is ey iicu o soe e (1= aayicay So we o Equaio (17

is (see igue 17 We ou a ece e eo Equaio (17 as aoe

sma oo aou 5 y usig e ucio soe i Maa wi e iiia

coiio 5 o soe Equaio (17 we ge 1 = 131593 a (1

is esu agees ey we wi e umeica iig a e og om CI ees

o e geae a 13 o geeaig e osciaoy souio

igue 6. o o Equaio (17 Equaio (17 as wo osiie oos oe is cose o 1 a e oe is ey cose o e sma oos ca e see moe ceay i e isee o

I is si o cea wy e asi ouay aoaces e ie wi aio

(og/so aou we e CIs ae o ea eo o ee iee iiia

coiios o e so om so = 1 1 a 1 we cacuae e og om

iucaio ois o e CI wi accuacy o 11 igis Ιο = 13577

5139 a 553 eseciey (see igue 1- Wi e og

a so om CI (so o e souios go o eo We e as igis o o icease

y oe e souios go o e imi cyce souio 4

igue 6.8 e iucaio souios wi so om CI equa o 1 CI cosa iiia ucio e og om CI iucaio oi wi accuacy o 11 igis is geeae so a e souios go o e imi cyce souio we e as igi iceases y 1

igue 6. e iucaio souios wi so om CI equa o 1 CI cosa iiia ucio e og om CI iucaio oi wi accuacy o 11 igis is geeae so a e souios go o e imi cyce souio we e as igi iceases y 1 95

Figure 6.20 The bifurcation solutions with short form CIF equal to 100. CIF, constant initial function. The long form CIF bifurcation point with accuracy to 11 digits is generated so that the solutions go to the limit cycle solution when the last digit increases by 1.

From the observation shown in Figure 6.21 (three solutions in one plot), we hypothesize that for given short form CIF s , there exists a long form CIF 1 , such that with the CIF (1 , s ), the solution of Equation (6.3) would be a long form oscillation at very small magnitude and the short form goes to zero exponentially. Furthermore, with different CIF (1 , s ), the oscillatory solution of the long form would be the same except there is a small shift (see Figure 6.21). 96

Figure 6.21 Solutions with different long and short form CIFs. Three distinct solutions for different CIFs near the basin boundary. For these CIFs, the solutions for the long form transcript go to zero after a small amplitude transient oscillation. The solutions for the short form transcript go to zero rapidly.

We also notice that the small oscillation has a period of about nine, which is a slightly higher than the delay time of eight. Why does this happen? From Figure 6.21, we know that when the small oscillation occurs, s(t) goes to zero, 1(t) is very small with a maximum around 0.005, and K is very large compared with 1(t). Thus the first equation of Equation (6.3) will can be approximated by the following equation:

(6.18)

Equation (6.18) has a fixed point at Ιο = 0.00413167192288. When we linearize the equation around the stationary point, we obtain the following linear DDE:

(6.19) 97

om igue we kow a i Equaio (1 as eioic souios aou e ie oi a e ieaie E (19 as e aamee coeicies

ocae o i e seco quaa wic coesos o e cue aamee i e iea ((k+15π (k+π e e eio mus e ewee τo a το 3

ecause e eio o e souio is T = π wee π ν 1 5π is agees wi e umeica simuaio ey we

y seig e ig-a sie o Equaio (3 o eo wi e ucio

sqoi i Maa we i aoe seay sae souio umeica souio 1

(19 1157 Wi is CI we sou ae goe e cosa seay sae souio u ue o e umeica eos e souio we o e same imi cyce ae some ime (see igue -3 wic meas a e ie oi 1 is o sicy sae

Figure 6.22 Souios wi CI saig ea e ie oi is ie oi a (19593133 11571119551 is a usae oi o Equaio (3 98

igue 3 ase ae o e imi cyce souio e ie oi (19593133 11571119551 iicae y e cice is ea e imi cyce souio

Summaie e yamics o e souios o e CI cases (see igue we e CIs ae ocae o e e o e asi ouay e souio goes o ( wic is a sae seay sae souio o e Equaio (3; we e iiia coiios ae ocae o e ouay coiio e og om goes o a ey sma osciaio aou 13 a e so om goes o eo e oi (13

579 1-10 is aoe seay sae souio a i is usae; we e iiia coiios ae ocae o e ig o e asi ouay e souio goes o a imi cyce ea e ie oi (19593133 11571119551 wic is

euay sae 99

Figure 6.24 Dynamical behavior of Equation (6.3) with optimal parameters. The zero region is to the left of the basin boundary, and the oscillatory region is to the right of the basin boundary. Three fixed points of Equation (6.3) are indicated by the small circles. If the CIF starts in the zero region, the solution is attracted to the origin; if the CIF starts in the oscillatory region, the solution is attracted to the limit cycle solution; if the CIF starts on the basin boundary, the solution for the long form will oscillate with a small amplitude, and the solution for the short form transcript goes to zero rapidly.

There still remain two questions to be asked theoretically based on the above observation of the system with the optimized parameters: why should the system have the same limit cycle or zero solution no matter where the initial conditions located in the oscillatory region, basin boundary, or zero region; how should we find the long form bifurcation CIF theoretically with the given short form CIF.

6.8 Conclusions and Discussion

In summary, a mathematical model has been proposed to simulate the regulation between two different transcripts from gene cstf-77. The model is able to produce 1

souios a aoimae e eeimea aa e caaiiy o e moe o

escie e osee osciaios as ee oe ese osciaios ca aso eai

wy some og om mA cao e eece ee i e mA is ecessay o

e ce om e SAGE aa is moe ca aso e aie o oe gees wi

simia oyaeyaio aes suc as oy(Α oymeise (A Muie oms

o oy(Α oymeise (As I1I III cA ae ee ou i uma a mouse

a auo-eguaio may aso eis i e gee eessio eguaio ewee iee

isooms (ao a Maey 199; ee e a

e sysem sows a ous a sae osciaio ous meas a e

ecise iiia coiios ae o ciica o osciaio o occu i e og om is

oe some eso I imies a e oyaeyaio ocess is ey sae

suggesig is imoace i e ce cyce a e Cs-77 og om oei is

esseia o e ce o ucio

e yamica sysem wi e oimie aamee as ee suie i eai

e cosa iiia ucios o e moe e o e iucaio o e sae souios

wic agees we wi e ac a kockig ow e oyaeyaio acos wou

ki e ce a a ceai amou o og om is esseia o escue e ce om

ea e asi ouay as ee ou umeicay a i sowe a ey

ieesig ae ee asis o aacio ae ee ou imi cyce eo a

eo-imi cyce wi e so om aoacig eo a og om aoacig a

ey sma amiue osciaio

ee ie ois ae ee ou o Equaio (3 I e eay is eo e sysem ecomes oiay ieeia equaios (OEs a e saiiy ca e aaye y cacuaig e aco mai a ese ois e eigeaues ca e

ou o e OEs e oi 1( is a sae ie oi e oi (13 0

579 1 -i ° is a sae oi a e oi 3 (19 1157 is a sae ie

oi I imies a e ime eay is equie o e sysem o geeae e osciaoy souios

y aig some moe acos suc as Cs- e moe ca e eee o moe come iosyeic aways e ieiicaio o e comee aeaie

oyaeyaio mecaism is a come ask ue e comeiy o e sysem a

e imie kowege a we ae aou e moe

e eeimea aa is ey imie a is oi owee oce e aa is aaiae is wok wou oie a comeesie a cosise maemaica

amewok o uesaig e aeaie oyaeyaio ocess CHAPTER 7

SUMMARY AND FUTURE WORK

aious oems eae o e oyaeyaio ocess ae ee iesigae i

is wok We A oymeise II is o e gee omoe egio a

syesies e e-mA ow oes e ce kow i e ascie ui is eoug

o e ce o ucio? eoeicay i ceaage ca occu a ay ocaio o e

e-mA a eas as may ascis as e eg o e gee ca e geeae

u wy ca oy a ew ascis e ou i e ce o oe gee ase o e

cA/ES? Oe easo cou e a some ascis ae o sae a ey ae

egae ey soo ae eig ascie suc a e cue ecoogy ca o

ee eec em ee may e some oe easos

I as ee ou a e ceaage cao occu aywee aog a e-mA

asci a sigas ae equie o e ceaage a oyaeyaio y a

aaysis o e sequece comosiio aou e ceaage sies some so sequeces

e cis-eemes cou e ou o occu moe oe a oe aiay

comiaios o uceoies I ese cis-eemes ae e eemia acos o e

ceaage a oyaeyaio (C e e occuece o C ca e eae as a

ucio o ese cis-eemes e cassiicaio ca e aie a is eas o e

is a o is wok (Cae 3 Α o o simia wok as ee oe o eic e

oy(Α sies a ou meos aciee ee accuacy a sesiiiy a a commoy use oe Sice e eease o e is esio o e wok some imoemes i accuacy ae ee mae owee e ase osiies a ase

egaies si eis wic meas a o some sequeces i ca e ceae a

oyaeyae ase o cA/ES u ey ae eice o o coai ay oy(Α

102 103

sies ase o e cis-eemes eeoe C mus ee o some oe acos a

e oe-imesioa sequece iomaio is o eoug o e eicio us o

e uue wok ca e oe o e eicio o oy(Α sies u ee wi ee o

e ae some moe ioogica acos suc as e wo-imesioa e-mA

sucue o e sequece

Aoe aua quesio may e aske i ee ae muie ceaage sies

wii e e-mA ow oes e ce kow wic oy(Α sie wou e use o

e cue ce coiio? o ey geeae equa umes o ascis o eac

oy(Α sie o o ey ee some sies moe a oes? Some eiece as ee

ou a e usage o oy(Α sies is issue-seciic a is i oe issue a oy(Α sie may e use moe oe a oe oy(Α sies u i aoe issue some

oy(Α sies may e seom use e aaiaiiy o age amou o gee eessio aa suc as micoaay a SAGE aa makes i ossie o aaye e

eaie eessio ees o iee ascis om e same gee is eas o e seco a o is wok (Cae Sice mos o e iee ascis use e same omoe egio o e gee i is yoesie a i e eessio ee o oe

asci goes u oe ascis om e same gee may go ow a ice ese

y aayig e SAGE aa we ou a is ki o egaie eguaio is o commo i uma Some sigiica gees ae icke ou o coaiig muie

ascis amog wic some o em ae oosie eessio ees iee o e

1 sigiica uigees ae ou o coai aeaie oy(Α sies coesoig o

e SAGE ags is may sugges a e aeaie oyaeyaio may e oe

aco a geeaes e egaiey eguae ascis ese gees may e ey useu o ioogiss a ey ee o e eiie y ioogica eeimes ue

Aso some gees ae osiie eguaio ewee iee ascis a mos o 104 gees ae aom coeaio amog e ascis A ese sugges a iee gees may ae iee aways o geeaig e aeaie ascis I e

uue i is ecessay o suy e away o asci eguaio wii e same gee o uesa e aeaie ascis geeae om aeaie sicig o aeaie oyaeyaio o aciee is oe gee was seece o is suy wic eas o e as a o is wok (e Cae

Gee csj3 was seece ecause e oei ouc o is gee is a ceaage a oyaeyaio aco a is gee eessio cou om a auo-eguaio y geeaig iee ascis e gee eguaio ewok as ee suie wiey a e asci eguaio ewok as eceie ie aeio e easos cou

e a some ascis ae o sae a ca o e eece e A ascis ae o imoa comae o e oei wic is e ucioa ui o e ce a some oei oucs om ese ascis ae o ee ou o ae ay

ucio Maemaica moeig is a ey imoa oo o uesa e come

yamics ieeia-eay equaios ae ee aie o suy e ioogica sysems moe oe ecey a ey wi e wiey use i e uue

A oe mecaism as ee oose o eai e aeaie

oyaeyaio a a wo-imesioa E as ee se u o simuae e ocess

Some ieesig esus ae ee osee o e oima aamees esimae y miimiaio o e eos o eimiay eeimea aa a e simuaio aa

We ou a e og om oei is esseia o e ce o suie a e so

om oei is o a imoa is agees wi e cue uesaig o e cstf-77 oei oucs Aso om e simuaio e og a so om mA

ascis osciae a someimes ey ae ey ow eessio ees is may eai wy i some SAGE iaies e og om is o eecae ee oug i is 105 esseia o ces ee is a asi ouay o e og a so om cosa iiia ucios e eoeica aaysis o e asi ouay wou e e uue wok e eeimea aa ae o suicie a is ime a eos may eis I e

aa wee comee is moe may oie moe isig io e mecaism e moe assume a oy e oe oei aco wou aec e oyaeyaio

Some oe acos suc as Cs- cou e ae io e moe i e uue i e

aa wee aaiae

Maemaics is come a ioogy is ee moe come i some sese

We eiee a ioogy ca e ee uesoo y ayig maemaics a maemaics ca e use owa y soig oems comig om ioogy APPENDIX A

STATISTICAL ALGORITHMS AND SIGNIFICANCE

A.1 Markov Chain Model

A Mako cai is a sequece o aom aiaes wi e oeies a e ese sae is oy eee o e as seea saes omay

(Χη+ι = χ Ι Χ =χ Χ1 _ χ1 Χ0 = o = (Χn+1 _ χ Ι Χ = χ Χη-+ι —η-+ι

Table A.1 is Oe Mako Cai asiio Mai

Ρ asCue Α Τ G C

Α ΡΑΑ A AG PAC

Τ A Ρ77 G c

G GA G GG Gc

C PcA Ρcτ CG cc

A G a C eese e ou uceoies Ρ ΜΝ is e oaiiy a e cue uceoie is gie a e eious uceoie was M

e ossie aues o Χ; om a couae se S cae e sae sace o e cai

(i e aoe omua is e oe o e Mako cai e oaiiy is cae e

asiio oaiiy I e sae sace is iie e asiio oaiiy isiuio ca

e eesee y a mai cae e asiio mai o eame cosie e

is-oe Mako cai o e uceoie sequece e asiio mai ca e

eesee y a mai (see ae A1 wee PMN is e oaiiy a e

106 0 cue uceoie is gie a e eious uceoie was M o e oice a

Σ ΡΜΝ =1 o ay gie M N

Α.2 iea iscimia Aaysis a Quaaic iscimia Aaysis

iscimia aaysis is a saisica oceue i wic a iiiua uaee iem (es

aa se is u io a gou ase o eiousy aee iems (aiig aa se o eame i ee ae wo gous o aee aa (oseaio aa oe eesee y

e ack o a e oe eesee y e wie o (igue Α1 e e quesio o

ow o seaae em a o wic gou a ew uaee iem make y aseisk i

igue Α1 is assige ae o e aesse ese quesios wee aswee y A

ise (ise 193 aou 7 yeas ago wo sowe a a quaaic ecisio ucio gie y e oowig

Σ (=sg( (χ—mί Σi-1 (χ— — (χ—m -1 (χ—m +1ίΣ1 (Α1 was e oima souio a cou iscimiae ewee wo ouaios i e wo

ouaios a oma isiuios (m 1 Σ 1 (m2 Σ e aiues o e wo

ouaio ae imesioa ecos wi mea ecos m 1 m2 a coaiace maices Σ1 Σ . It says a we ( is equa o 1 e eco is eoge o oe

ouaio; oewise i is eoge o aoe ouaio 08

igue A. iscimia oem e ack os a wie os eese wo gous o aa e aseisk eeses e uaee aa

I Σ 1 = Σ =Σ Equaio (A1 egeeaes o e oowig iea ucio

(A

I ac Equaio (A is a ye-ae a seaaes e wo ouaios a

us is meo is cae iea iscimia Aaysis (A e ucio i Equaio

(A1 is a quaaic ucio a us is meo is cae Quaaic iscimia

Aaysis (QA A wo-imesio A a QA ae simy eesee i igue A

I ca e see a usig iee iscimia ucio e same aa may e assige o

iee gous o eame e uaee aa make y aseisk i igue A wou

e aee as ack o y A a wie o y QA 109

igue A A a QA A iea iscimia aaysis; QA quaaic iscimia aaysis A uses a iea ucio o seaae e wo gous o aa a QA uses a quaaic ucio o seaae e wo gous o aa

A3 Suo eco Macie

A suo eco macie (SM is a se o agoims use o cassiicaio a

egessio (Coes a aik 1995 e geea agoim icues wo ses e is se is o ma e iu sace o a ige imesioa eaue sace oug some maig (cae a kee ucio e i e eaue sace i a ye-ae a seaaes e aa io iee gous (see igue A3

igue A3 Suo eco macie agoim Iu sace e oigia aa sace; eaue sace a ige imesioa sace mae om e iu sace oug some kee ucios 11

e oima seaaig ye-ae is seece y maimiig e magi

ewee e casses coses ois e ois yig o e ouaies ae cae suo

ecos Usig a kee ucio a SM is a aeaie aiig meo o

oyomia ucios aia asis ucios ( a sigmoi ucios ee ae

seea omuaios a kees we usig a SM (Suges a Smoa 199 ee a

C-cassiicaio wi a kee is iey escie wic ca e maemaicay

omuae y

(Α3

Wee α is a eco o coeicie o e oimie e is e eco o a oes

C is a eeie cosa Q is a n x n osiie semi-eiie mai wi e comoe eie y

wee

is e kee ucio y is aoe eeie cosa e ecisio ucio is gie y e oowig

e suo eco macie agoim as ee imemee io sowae

iaies y may gous e mos oua ackage is ISM (Cag a i 1 wie i C++ a aa a i is use i is wok 111

A.4 Statistical Significance

ue osiie (ΤΡ e ume o aa eice osiie a e aa ise is osiie

(e eicio is ue

ase egaie ( e ume o aa eice egaie a e aa ise is

osiie (e eicio is ase aso cae ye II eo

ase osiie ( e ume o aa eice osiie a e aa ise is

egaie (e eicio is ase aso cae ye I eo

ue egaie ( e ume o aa eice egaie a e aa ise is

egaie (e eicio is ue

Ρ ΤΡ Sesiiiy (S S = Seciiciy (S S = + +

TPxTN—FPxFN Accuacy (CC CC = (Ρ + (Ρ + ( + (Ν + APPENDIX B

PROGRAMMING LANGUAGES

Β. E

Practical Extraction and Report Language (PERL) is a script language and is very famous

for its strong regular expression ability. Originally, it was developed for text manipulation. But now, it has wide usage including system administration, web

development, network programming. It is very useful in comparing gene sequence,

finding the gene patterns and retrieving sequences. Also, there is a biological module in

PERL called Bioperl (Stajich et al. 2002). You can retrieve sequence from database such

as GenBank and SwissProt just by writing several lines scripts. Also, via Bioperl, we can

execute analysis such as BLAST, ClustalW, etc.

B.2 R Language

R is a language and environment for statistical computing and graphics. It is similar to the

award-winning S system, which was developed at Bell Laboratories by John Chambers et

al (Chambers 1998). It provides a wide variety of statistical and graphical techniques

(linear and nonlinear modeling, statistical tests, time series analysis, classification, and

clustering, etc.). Most importantly, it's free to download (http://www.r-project.org ) and install and so it is widely used by university and research institute. Also, there are a lot of packages free to install and easy to use. Support Vector Machines package e1071 is also

included in R (Dimitriadou et al. 2006).

112 AEI C

SAGE MAEIAS

C. SAGE Geie

e SAGE aa wee owoae om SAGE Geie (oo e a see

//ciigo/u/SAGE/UMA e aaase esio o is suy is uae o 7/11/ e oowig ies ae owoae om e see

"siaies" wic is e uma SAGE iaies iomaio ie icuig o

og a so iaies; "s_oges_gee" wic is e maig ie o og

SAGE ag o e es uigee; "s_ogequecies" wic is equecy aa ie

o og SAGE iaies; "s_ogma" wic is e maig ie o og ag o accessio ume a e ak o e maig; "s_sοes gee" wic is e maig ie o so SAGE ag o e es uigee; "s_soequecies" wic is

e equecy aa ie o so SAGE iaies; "s_sοma" wic is e maig ie o so ag o accessio ume a e ak o e maig e uigee ui ume is 19

C.2 Meos i SAGE Aaysis

M. e asci auace is omaie o ascis e miio (M o comaiso ewee sames easo coeaio is ase o M

es o ieeece. e es saisic (e 3 is gie y

η - =Ί 1— wee is e easo coeaio coeicie ewee wo ags a is e egee o

eeom e ume o iaies a a eas oe o e wo ags as eesse

114

Fisher-Shuffle. o a aay wi m iems is meo ca geeae a aom

aay ase o is aay e ses ae e oowig sca e aay om e is

eeme o e as eeme; o eac ocaio i o e aay geeae a aom iege

ess a o equa o e eg o e aay a ecage e iem i a Ae e

scaig a ew aay is geeae wi m iems is meo is use o e

aomiaio o SAGE aa

C.3 Sixty-one Significant Unigenes with Negatively Regulated Transcripts

Siy-oe sigiica uigees ae geeae ase o e -aue ess a 1 a

e eaie meos ae escie i Cae 4.

Table C.1 Siy-oe Sigiica Uigees wi egaiey eguae ascis

s1133 SS1 Siga sequece eceo aa CACCAAGAAGGCAAC AAACGCGAC - 15(1115 -15 s135 MGC539 yoeica oei MGC539 GGACCGAGGGGCGGAG CCAAAAAACAAG -33 3(11193 -53 s155 EOI ei eceo oeaig asci-ike 1 AGAGAGAGCGGGA GCAGGCACAAAGC -3 19(1591373 -711 s153 A1C AA o iig oei (-associae aco A oymeise I C 11ka CCAAGGGAGACAA GCCAAGCACCACCCCA - 13(1577 -373 s1551 UI eeogeeous ucea iouceooei U-ike 1 CGGCAGAAAGAAA GCACCCCAGCAGGAA -7 3(775 -375 s111 ΤA3 yoi omoe eceo associae oei 3 ACACGGGAGGC AGGGGGAAAAGAA -3 7(9177 -35 GAAAAACGAAG GCGGGCCCGAG -39 179(13793 -579 s11 KAI Kayoei aa 1 (imoi aa 5 AGAAAGCAGCC ACCGCAACCCAAA - 153(71153 -397 s113 Ε-3Κ ikey ooog o mouse uiquii-cougaig eyme GCACACGGGACAGGC GGAGAGGAAGCC -55 1(171911 -359 s177 ACSI oei kiase C a casei kiase susae i euos CCAGGCACG GAGAGAGGAAA -3 7(1157 -559 s19951 ΑΝΒ A iig oei GGGAAAAAAAG GCGCGGGCGAGGAGG -15 5(1719 -1 GCGCGGGCGAGGAGG CCGAAAGA - 9(7 -35 115

s335 YC ucea asciio aco Y gamma CCGGGGGCCGAGAC AAAGCAAAAACCA -7 7(155111 -91 s155 AI Aiae omoog (osoia GAACGGCCC AAACCAAAC -7 1(1135111 -195 s AG1 AG1 auoagy 1-ike (S ceeisiae GAAAAGAAAAC GGGCACACCGAA -37 13(3155 -555 s9 ΤΜΕΜ asmemae oei AGACAGCGACAACA AAAGGGG -39 15(11175 -59 s393 UΒΕ1 Uiquii-cougaig eyme ΕI (UΒC9 omoog yeas CCCACCGGCAGAG AGGGCCCGGAAAG -177 (3193179 -7955 s335 GΒ3 G iig oei 3 (miocoia GCAAGGGC CGGAGCCGACC i-33 173(13939 -51 s337 133 3 e-mA ocessig aco 3 (yeas AGGGGAAAAAAC GGCCAGGAC -3 197(17175 -137 s31 E1 ee ime eai eoucease 1 GGGCACCAGCACCAC AGGGGCACAGCCC -39 1(1311 -579 >Ηs.6804 S2 iosoma oei S2 AGGCGGACGAGA AAACGCAGAGC -31 (1111 -5 AGGCGGACGAGA GAGAGCCCAGAAGGG - (1111 -53 AAACGCAGAGC AAGACAAGCCCGGA -31 19(11157 -5 AAGACAAGCCCGGA GAGAGCCCAGAAGGG -7 (1511115 -3 s3717 CC1 CC1 ce iisio cyce 1 omoog (S ceeisiae GCAAGAAGCCCAC AGAGAGACACA -3 1(111757 -55 s359 UΒΕΒ Uiquii-cougaig eyme E (A omoog CCCCAGAAGCAAG ACAGAAAACACA - 1(9151 -77 s37755 Co19 yoeica oei C577 ι CGGGCAGCACCGGGC ACCAGGAAGCA -3 1(1317 -973 s7 CS Coaco equie o S 1 asciioa aciaio suui 15ka CGCCCG AAAGCGGGAGG - 11(19 -531 s131 AC 1 as-eae C3 ouium oi susae 1 (o amiy sma G iig oei ac AGACAAAAACC GCAAGGAGAGGGC -19 (3935 -717 s3137 Co55 Comosome oe eaig ame 55 GAGGCCACCGAC AAGAGAAGAAAAG - 17(11 -37 >s.4288 4 Miogeaciae oei kiase kiase kiase CGCCGGAACACCAC AAAGAACAAGAAA -199 (1313 -317 s3375 ΕΙG1 Eukayoic asaio iiiaio aco gamma 1 GACACACAGAGGCCCG AACAAAAAAAAAA -157 1(151913 -573 s3555 U1 U omai coaiig 1 GGCCGCCCGGG GGCCCGCCCGGG -3 9(13193 -555 116

s379 7 omoomai coaiig 7 GCGACGCAGAAAAC GAAAACAACGGGA -35 13(9515 -5757 s59 CSKIAI Casei kiase 1 aa 1 CAGGAAAAAAAAG AACACACGCCA -3 1(1 -93 >s.4600 SMACEI yoeica oei MGC462 AGAGAGAG AGCCACCGCACCCAGCC -13 35(1531 -337 AGCCACCGCACCCAGCC AGCACAGCCGCGA -13 3(31111 -3357 s7 IG 1 Iiio o gow amiy meme 1 GACAAAGCCCGGAGAA CGCACACCAA -97 13(933 -7979 s737 CC Ce iisio cyce (G iig oei 5ka CACCGGGAGACAAG CCAACGAA -1 (1335 -953 CACCGGGAGACAAG GAAAGACCAAGCA -75 3(13115 -35 s9 GUOK eoyguaosie kiase AAGCAGGCGGAC AGCAGGCGGAC -5 9(1191555 - s75 IG eiy-oy isomease G (cycoii G GGGACAAGA AGAAAAACAAAAAAAAA -3 17(19135 -51 s7175 CE ΝΕ-cougaig eyme ACGCCAGAGAG ACGAAC -79 3(17119 -11 s73 eeogeeous ucea iouceooei (AU-ic eeme A iig oei 1 37ka GACCACGCAGC AAAAAGCAGGAA -5 1(7717 -7375 s7 199 yoeica oei 199 AGGCAGAGCGAC CACACCAGACCC -9 3(31713 -37 s59 S1 Sicig aco 1 AAGGACGGACA CCGCCCCGGGAGCC -15 3(315915 -595 s5759 ΑΤΡΑ Aase Ca++ asoig caiac musce sow wic AGACCGGAAA GCAGGAAAAAA -3 3(111 -3137 ό s577 SΡ3 Siga eie eiase 3 - GGCCCAGA CCCCACCCAAGA -35 197(1171 -73 s5153 C9333 yoeica oei C11 ACCGAGAA GAGGGGCCCCAC -3 1(1 -5573 s515515 KEI KE (ys-As-Gu-eu eoasmic eicuum oei _ eeio eceo 1 GACAGAACGA GGAGCAAGACCGGCGA - (11511 -39 s51539 - ΝΡΑ3 eeogeeous ucea iouceooei Α3 GAGGAAGCAAAC AAAGGGGGCAGGG - (15717 -315 s517 SO SO A iig oei GAAAGAAACCA AAAACAGCAAGACGA i-3 17(719 -15 s51 ΑΝΚI7 Akyi eea omai 17 GAAGGGGGCGCAC GAAAAAAAAAG -39 177(99179 -553 GAAAAAAAAAG AACAGGGCAAGAA -35 1(171155 -513 117

s59 A-1 Mao isocomaiiiy come cass II ea 3 GAAAAAAAAAAAAAAAA GCAGCGACAGGAC -195 3(39 -37 s553 C1o119 Comosome 1 oe eaig ame 119 GGGGGCACCGAG AAAGGAAAGAGAA -3 (111911 -33 s597 C aosi-coaiig oei GAAAAGGACAAAA CGCGCGCGCCGC -1 (3515 17 -3195 s5979 ΒΤ3 asic asciio aco 3 CGAGACGAAGCAGCG CGAGACAAAGCAGCG -5 5(337 -31 s59957 SEC3 SEC3-ike (S ceeisiae CAAAGGAAGC AGGCAAGCCGGCCA -3 15(79 -91 s53 UA5 Uiquii A-5 esiue iosoma oei usio ouc 1 CAGACGGAAGAC GGAAGCCCCG -5 (3151 -137 s5311 ΒΜ5 A iig moi oei 5 CGAGACAAA GACGAGAAGAAGC -31 19(11173 -59 s53179 A 1 A 1 omoog (S ome CCGAAGCCAGCAC AAAAAAGCC -55 17(131 -3733 s533 SEC3 A seeocyseie associae oei AGGAGGGAGAGAGA GGGCCACGGGAG - 19(17111 -37737 s533 S13 Suessio o umoigeiciy 13 (coo cacioma (s7 ieacig oei CAGGACCAGAAGA CAAACCAA -13 37(3117519 - s531 ΑΤΡC1 Aase Ca++ asoig ye C meme 1 AGCGACACACA GAAGCCAGCAAC -3 15(1 -531 s59 ΜGC yoeica oei ΜGC AAAGGGAGGAG AGGGACACACA -531 17(79757 -755 s597 ΜGC137 yoeica oei MGC137 GGGGCACCCGCGCCCC GGGGCAGCCGCGCCCC i-37 (1119 -33 s737 Simia o IE-1 eese asciase omoog GACGGACCAAAACA AAAAGCAAAGG - (1315 -39 s999 ΡΑΟ eeogeeous ucea iouceooei AO AAGAGCGGCGGCGGCGG CACAAAAACG -1 157(191 -11 e ow egiig wi "" i e is coum is e uigee i a e gee esciio e ows egiig wi ags ae e ag iomaio o e uigee i gie i e eious ie e is wo coums ae e egaiey eguae ags e i coum is e easo coeaio o e M aa o e wo ags e ou coum aa is omae ike s(m wee s is e oa iay ume a a eas oe ag eesse i m is e eesse iay ume o e is coum ag is e eesse iay ume o e seco coum ag a is e eesse iay ume o o ags e i coum is e -aues e ies i o yeace ae e gees wi ascis ocae i iee sas o comosomes 118

C.4 Transcript Structure of Eight Significant Unigenes

asci sucues ae geeae o eig we-kow sigiica uigees a sow i igue C1 119 1

igue C1 e asci sucue o eig sigiica uigees e is ie eeses e comosome iomaio o e gee a e oowig ies eese e asci sucue geeae om is gee e SAGE ag is iicae y e u-iage a e accessio ume o e asci is iicae ue e is eo o e asci (A Ηs531 ( Ηs517 (C Ηs51539 ( Ηs73 (E Ηs531 ( Ηs5979 (G s5759 ( Ηs337 EEECES

(199 "Geome sequece o e emaoe C eegas a aom o iesigaig ioogy" Science (539 1-

Aamso Τ E C Su a Η ice (5 "ucioa couig o ceaage a oyaeyaio wi asciio o mA" J Biol Chem (37 3-71

Aews Ε M a iMaio (1993 "ieacy o oyaeyaio sie usage y oie aiomaius i asome mouse ces" J Virol 7(1 775-1

Aoiou M E e oe E Saoouou A Imam a Gose (1995 "ΤΒΡ iig a e ae o asciio iiiaio om e uma ea-goi gee" Nucleic Acids Res 3(17 373-

Ai G K M oos S agga C Micaek a Wius ( "owseam sequece eemes wi iee aiiies o e / oei iuece e ocessig eiciecy o mammaia oyaeyaio sigas" Nucleic Acids Res 3( 1-5

Auie A a M Simoeig (199 "Auoeguaio a e ee o mA 3 e omaio o e suesso o oke gee o Drosophila melanogaster is cosee i Drosophila virilis." Proc Natl Acad Sci USA 95( 13-7

a Y uo a G G Camicae (199 "oyaeyaio a asciio emiaio i gee cosucs coaiig muie aem oyaeyaio sigas" Nucleic Acids Res (1 11-

ecque C S aco euy ouicau a O Gaio ( "Sog-associaio-ue miig o age-scae gee-eessio aa aaysis a case suy o uma SAGE aa" Genome Biol 3(1 ΕSΕΑCΗ7

ee C M E mkow amse K C Oia Q u ueia A O Sigeoka Ocs a Cace (1 "A ae oyaeyaio siga muaio o e ΟΧ3 gee (AAUAAA--AAUGAA eas o e AE syome" Immunogenetics 53( 35-9

ieo S E Wae C Sue-Caoaa a W Kee (1991 "uiicaio o e ceaage a oyaeyaio aco ioe i e 3-ocessig o messege A ecusos" JBiol Chem (9 197-7 oo K Ε C Osoio S Geeu C Scaee Soemake K oyak Moi K ueow Sauseg e a ( "A aaomy o oma a maiga gee eessio" Proc Natl Acad Sci USA 99(17 117-9

ow S iey a Cue (1991 "Eicie oyaeyaio wii e uma immuoeiciecy ius ye 1 og emia eea equies akig U3-seciic sequeces" J Virol 5( 33-3

121 1

uce S W ige a M eeso (3 "-ce a asma-ce sicig ieeces a oeia oe i eguae immuogoui A ocessig" Rna 9(1 1-73

uaowski S (5 "Coecios ewee mA 3 e ocessig a asciio emiaio" Curr Opin Cell Biol 17(3): 57-1

Suges C C a A Smoa (199 Advances in Kernel Methods-Support Vector Learning. CamigeMA MI ess

Cao O a Maey (1 "Eouioaiy cosee ieacio ewee Cs- a C iks asciio oyaeyaio a emiaio" Mol Cell 7(5 113-3

Caswe S a C Awie (199 "Eiciecy o uiiaio o e simia ius ae oyaeyaio sie eecs o useam sequeces" Mol Cell Biol 9(10): -5

Caegi S Cew a A Kaie ( "iseig o siece a uesaig osese eoic muaios a aec sicig" Nat Rev Genet 3( 5-9

Caseo-aco A uge M Woeo C Smi A Moeia a ouoo ( "oyyimiie ac iig oei mouaes eiciecy o oyaeyaio" Mol Cell Biol 24(10): 17-3

Cames M (199 Programming with Data: A Guide to the S Language. ew Yok Sige-eag

Cag C-C a C- i (1 "LIBSVM : a library for support vector machines." om , (isie e 1

Cao C A ami S Kim uag a G Maiso (1999 "Assemy o e ceaage a oyaeyaio aaaus equies aou 1 secos i io a is ase o sog a o weak oy(Α sies" Mol Cell Biol 19( 55-

Ceg Y M Miua a ia ( "eicio o mA oyaeyaio sies y suo eco macie" Bioinformatics (19 3-5

Coga a Maey (1997 "Mecaism a eguaio o mA oyaeyaio" Genes Dev 11(21): 755- Coo C C Squies a C Squies (1995 "Coo o A asciio i Esceicia coi" Microbiol Rev 59(4): 3-5

Coes C a aik (1995 "Suo-eco ewoks" Machine Learning 20: 73-97

Cowi S Saaya a S omso (1997 "KΑG a coe ase o coiuousy imee si-oe uge-Kua meos o e souio o 13

sae-eee ucioa ieeia equaios" Applied Numerical Mathematics (-3 319-33

Coe M W eug E Gaso a aimoe (5 "Acieig saiiy o iooysaccaie-iuce -kaa aciaio" Science 39(57 15-7

Cick (197 "Cea ogma o ioogy" Nature 7 51-3

Cui Y a C eis (3 "I io eiece a eecs i e asciioa eogaio acos ΙΙS a S5 eace useam oy(Α sie uiiaio" Mol Cell Biol 3(1 77-91

e Moo C Meie a S issee (5 "Mecaisms o asaioa coo y e 3 U i eeome a ieeiaio" Semin Cell Dev Biol 16(1): 9-5 e ies U uegsegge W ue A ieei age a W Kee ( "uma e-mA ceaage aco 11(m coais omoogs o yeas oeis a iges wo oe ceaage acos" Embo J 19(1 595-9

eome M a C Coe (19 "aes o oyaeyaio sie seecio i gee cosucs coaiig muie oyaeyaio sigas" Mol Cell Biol (11 9-39 eao a M Imeiae (199 "Sequeces useam o AAUAAA iuece oy(Α sie seecio i a come asciio ui" Mol Cell Biol 9(11 951-1

iekma (1995 Delay Equations: Functional, Complex, & Nonlinear Analysis. ew Yok Sige

imiiaou E K oik eisc Meye a A Weigesse ( "Support Vector Machines, the interface to libsvm in package e1071." om , (isie e 1

omiski C Yag a W Mau (5 "e oyaeyaio aco CS-73 is ioe i isoe-e-mA ocessig" Cell 13(1 37-

Ewas-Gie G K eai a C Micaek (1997 "Aeaie oy(Α sie seecio i come asciio uis meas o a e?" Nucleic Acids Res 5(13 57-1 Eise M Sema ow a osei (199 "Cuse aaysis a isay o geome-wie eessio aes" Proc Nat! Acad Sci U S A 95(5 13-

Eowi M a S eie ( "A syeic osciaoy ewok o asciioa eguaos" Nature 3(77 335- 1

Egeogs K uyaia a oose ( "umeica iucaio aaysis o eay ieeia equaios usig E-IOO" ACM Trans. Math. Softw. %@ 0098-3500 (1 1-1

a C E Maa Wage a yso (3 Computational Cell Biology, Sige ise A (193 "e use o muie measuemes i aoomic oems" Ann. Eugenics 7 111-3

eiscma M Aams Ο Wie A Cayo E Kikess A Keaage C u om A ougey e a (1995 "Woe-geome aom sequecig a assemy o aemoius iueae " Science 9(53 9-51

oes K Aeai a A G u ( "A Aaiosis i omoog ieacs wi A a oies coceua iks wi a ume o oe oyaeyaio aco suuis" JBiol Chem 1(1 17- iscmeye A A a oo K Ooe A Gueeio ake a C ie ( "A mA sueiace mecaism a eimiaes ascis ackig emiaio coos" Science 95(553 5-1

Gawae M oia A a a Sig ( "osoia Se-ea oei meiaes oyaeyaio swicig i e emae gemie" Embo J 5( 13-7 Gee A W Kasak a A Saio ( "Sucua ieeiaio o e I-1 oyA sigas" Biomol StructDyn 23(4): 17-

Geig U ee G eu-Yiik usoee ee M W ee a A E Kuoik (1 "Icease eiciecy o mA 3 e omaio a ew geeic mecaism coiuig o eeiay omoiia" Nat Genet ( 39-9 Giaia C a is (1993 "oymeise ocessiiy a emiaio o osoia ea sock gees" JBiol Chem (3 3-11

Gimai G M a eis (1991 "Moecua aayses o wo oy(A sie-ocessig acos a eemie e ecogiio a eiciecy o ceaage o e e-mA" Mol Cell Biol 11(5): 3- Goeau A G ae ussey W ais uo ema Gaie oeise C acq e a (199 "ie wi gees" Science 7(57 5 53-7 Goee A (1995 "A moe o cicaia osciaios i e osoia eio oei (E" Proc Biol Sci 1(13 319-

Goowi C (195 "Osciaoy eaio i eymaic coo ocesses" Adv Enzyme Regul 3: 5-3 2

Goowi C (19 "A eaime moe o ime eyme syeses i aceia" Nature 20(22: 79-1

Gae Η (3 "aiaios i yeas 3-οcessig cis-eemes coeae wi asci saiiy" Trends Genet (: 73-

Gae Η C Cao S C Mo a Τ Smi (1999 "I siico eecio o coo sigas mA 3-e-οcessig sequeces i iese secies" Proc Natl Acad Sci USA 6(24: 155-

Gae G McAise a Τ Smi ( "oaiisic eicio o Saccaomyces ceeisiae mA 3-οcessig sies" Nucleic Acids Res 3( 151-

Gaeey a G M Gimai (199 "A commo mecaism o e eaceme o mA 3 ocessig y U3 sequeces i wo isay eae eiiuses" J Virol 7(3 11-7

Geeaum C Coageo K. Wiiams a M Gesei (3 "Comaig oei auace a mA eessio ees o a geomic scae" Genome Biol (9 117

Gii S (19(a "Maemaics o ceua coo ocesses I. egaie eeack o oe gee" J Theor Biol 20(2: - Gii S (19( "Maemaics o ceua coo ocesses II osiie eeack o oe gee" J Theor Biol 20(2): 9-1 aaais A I Ko a ui ( "A oaiisic moe o 3 e omaio i Caeoaiis eegas" Nucleic Acids Res 2(: 339-9 a-oga Τ ag ia a C S u (5 "Aeaie oyaeyaio o cycoοygease-" Nucleic Acids Res (8: 55-79 ao S Eoaa M igueieo Y akagaki Maey a K Oae ( "e Drosophila omoogue o e ka suui o ceaage simuaio aco ieacs wi e 77 ka suui ecoe y e suppressor of forked gee" Nucleic Acids Res 28(2: 5- eie U a a Μ C Mackey (19 "e yamics o oucio a esucio aayic isig io come eaio" Journal of mathematical biology 6: 75-11

ee K Μ A a oa Wog Α Mooey K C euma aick a S Μ ock ( "Sequece-esoe eecio o ausig y sige A oymeise moecues" Cell 2(6: 13-9

ooes C eac a K awey (199 "Coiuios o e AA o sequece o ae-imiig ses i asciio iiiaio y KA oymeise II" Mol Biol 2(: 115-31 126

u C S u Wius a ia (5 "ioiomaic ieiicaio o caiae cis-eguaoy eemes ioe i uma mA oyaeyaio" RNA 11(10): 15-93

acoso Α a S W e (199 "Ieeaiosis o e aways o mA ecay a asaio i eukayoic ces" Annu Rev Biochem 65: 93-739

Kaeko S a Maey (5 "e mammaia A oymeise II C-emia omai ieacs wi A o suess asciio-coue 3 e omaio" Mol Cell 20(1): 91-13

Kee W (1995 "o e ye o messege A 3 ocessig" Cell 1( 9-3

Kee W S ieo K M ag a G Cisooi (1991 "Ceaage a oyaeyaio aco C seciicay ieacs wi e e-mA 3 ocessig siga AAUAAA" Embo J 10(13): 1-9

Kases I M iese A iae a ekou (1999 "e aiiy o e I-1 AAUAAA siga o i oyaeyaio acos is cooe y oca A sucue" Nucleic Acids Res 7( -5

Koeky G Α a S Myug (1 "osiie a egaie eguaio o -ce aciaio y aao oeis" Nat Rev Immunol 1(2): 95-17

Koea A Musae M Koo A Maoa ikioo A Goa a S A as ( "A sucua moe o asciio eogaio" Science 9(579 19-5

Ku M S Y Soko Wu M I. ussie-ua A oy a A aa (5 "osiie a egaie eguaio o e asomig gow aco ea/acii age gee goosecoi y e II-I amiy o asciio acos" Mol Cell Biol 5(1 71-57

ai-Oiga I A Eeic Kake e uie a A a e E (199 "Aaysis o oyaeyaio sie usage o e c-myc ocogee" Nucleic Acids Res 17(1 99-51

ae E S M io ie C usaum M C oy awi K eo K ewa M oye e a (1 "Iiia sequecig a aaysis o e uma geome" Nature 9( -91

ee Y I Ye Y ak a ia (7 "oyA_ mA oyaeyaio sies i eeae gees" Nucleic Acids Res 35(aaase issue 15-

ee Y Y ee a Cug ( "A ioess gee ecoig a oy(A oymeise is seciicay eesse i esis" FEES Lett 7( 7-9

egee M a Gauee (3 "Sequece eemias i uma oyaeyaio sie seecio" BMC Genomics (1 7 17

ei iggs Α Gi a ouoo (199 "eiiio o a eicie syeic oy(Α sie" Genes Dev 3(7 119-5

ewi Β (3 Gene VIII, eice a ewis G Wama a S Saues (1997 "Simuaios o ee-uig yms ig Eaime a e ig-use ase esose Cues o e ocomoo Aciiy ym i eio Mua o osoia meaogase" Journal of Theoretical Biology 15( 53-51 iu Β a Β M Aes (1995 "ea-o coisio ewee a A eicaio aaaus a A oymeise asciio come" Science 7(51 1131-7 osso m Μ C Mackey a A ogi (1993 "Souio muisaiiy i is-oe oiea ieeia eay equaios" Chaos 3( 17-17

ou K M eugeaue Gage a S M ege (199 "eguaio o aeaie oyaeyaio y U1 ss a S" Mol Cell Biol 1(9 977-5 u C S K G Muy Scek OCoo Maey a C Awie (199 "Ieacio ewee e U1 s-A oei a e 1-k suui o ceaage-oyaeyaio seciiciy aco iceases oyaeyaio eiciecy i io" Genes Dev 1(3 35-37

Ma iema G Gi a G iciei (199 "osiie a egaie eguaio o ieeuki-1 gee eessio" Eur Cytokine Netw 9(3 Su 5- Mackey M C a Gass (1977 "Osciaio a caos i ysioogica coo sysems" Science 197(3 7-9 Mackey Μ C a I G ecaea (199 "oise a saiiy i ieeia eay equaios" Journal of Dynamics and Differential Equations (3 395-

Magus A Μ C Eas a A acoso (3 "oy(A-iig oeis muiucioa scaos o e os-asciioa coo o gee eessio" Genome Biol (7 3 McCacke S og K Yakuo S aaye G a Geea S aeso M Wickes a eey (1997 "e C-emia omai o A oymeise II coues mA ocessig o asciio" Nature 35(1 357-1

Mieoi S Geagy Iowu am M Aoiou a S age ( "Α oe ucio o e UΑ 5 sicig aco i omoig e-mA 3-e ocessig" EMBO Rep 3(9 9-7 Mok A (3 "Osciaoy eessio o es 53 a -kaa ie y asciioa ime eays" Curr Biol 13(1 19-13 28

Moeia Α Y akagaki S ackeige M Woeo Maey a ouoo (199 "e useam sequece eeme o e C comeme oy(Α siga aciaes mA 3 e omaio y wo isic mecaisms" Genes Dev 2(6: 5-3

Muay (199 Mathematical Biology. ei Sige

Muy K G a Maey (199 "Caaceiaio o e muisuui ceaage-oyaeyaio seciiciy aco om ca ymus" J Biol Chem 7(1 1-11

Muy K G a Maey (1995 "e 1-k suui o uma ceaage-oyaeyaio seciiciy aco cooiaes e-mA 3-e omaio" Genes Dev 9(1 7-3

Myes E W G G Suo A ece I. M ew asuo M aiga S A Kai C M Moay Κ eie e a ( "A woe-geome assemy o osoia" Science 7(51 19-

ie G Wu a W ag ( "Coeaio o mA eessio a oei auace aece y muie sequece eaues eae o asaioa eiciecy i esuoiio ugais Α quaiaie aaysis" Genetics.

ikaio I C Saio Y Miuo M Meguo oo M Kaomua Koo G Α Mois Α yos e a (3 "iscoey o imie ascis i e mouse asciome usig age-scae eessio oiig" Genome Res (6Β: 1-9

OGoma W K Y Kwek omas a A Akouice ( "o-coig A i asciio iiiaio" Biochem Soc Symp(73): 131-

Oaies G a eieg ( "A uiie eoy o gee eessio" Cell 08(4: 39-51

a ag K ague Y ee C S u a ia ( "A ioic oyaeyaio sie i uma a mouse Cs-77 gees suggess a eouioaiy cosee eguaoy mecaism" Gene 66(2: 35-3

ak C sao a G Maiso ( "e wo ses o oy(A-eee emiaio ausig a eease ca e ucoue y ucaio o e A oymeise II caoy-emia eea omai" Mol Cell Biol 24(0: 9-13

ee Caaias M a G aai (3 "ecogiio o GU-ic oyaeyaio eguaoy eemes y uma Cs- oei" EMBO J 22(: 1-3

iis C acikaa a S I. Gueso ( "UA iiis ceaage a e immuogoui M eay-cai seceoy oy(Α sie y iig ewee e wo owseam GU-ic egios" Mol Cell Biol 24(4: 1-71 19

ouoo ( "ew esecies o coecig messege A 3 e omaio o asciio" Curr Opin Cell Biol 1(3 7-

Qiu a ie ( "Aeaie oyaeyaio o aeo-associae ius ye 5 A wii a iea io is goee y e isace ewee e omoe a e io a is iiie y U1 sma ucea iig o e ieeig oo" JBiol Chem 79(15 19-9

Quee Maco M eeue O Ceme iea oaou Commes iquema a Mai ( "Miig SAGE aa aows age-scae sesiie sceeig o aisese asci eessio" Nucleic Acids Res 3( e13

oca E a A aci (3 "Esseiaiy o eessieess ies gee-sa ias i aceia" Nat Genet 3( 377-

osee W Youg U Ao S Swai a M Eowi (5 "Gee eguaio a e sige-ce ee" Science 37(5717 19-5

uy S W (1997 "yamics o e U1 sma ucea iouceooei uig yeas siceosome assemy" JBiol Chem 7( 17333-1

uegsegge U K eye a W Kee (199 "uiicaio a caaceiaio o uma ceaage aco Im ioe i e 3 e ocessig o messege A ecusos" JBiol Chem 71(11 17-13

uegsegge U ak a W Kee (199 "uma e-mA ceaage aco Im is eae o siceosoma S oeis a ca e ecosiue i io om ecomia suuis" Mol Cell 1( 3-53

Sacs A Β Saow a M W ee (1997 "Saig a e egiig mie a e asaio iiiaio i eukayoes" Cell 9( 31-

Saa S A Β Saks C ago Akmae C Wag ogesei K W Kie a E ecuescu ( "Usig e asciome o aoae e geome" Nat Biotechnol (5 5-1

Saamo A A a Sooye (1997 "ecogiio o 3-οcessig sies o uma mA ecusos" Comput App! Biosci 13(1 3-

Scee Kikeeg C ea a a e (1999 "A maemaica moe o e iaceua cicaia ym geeao" J Neurosci 19(1 -7

Samie a S (1 "Soig Es i MAA" Applied Numerical Mathematics, 37 1-5

Se S A C esse S M Mois a C Micaek (5 "Eeae ees o e -ka ceaage simuaoy aco (Cs- i iooysaccaie-simuae macoages iuece gee eessio a iuce aeaie oy(Α sie seecio" JBiol Chem ( 3995-1 13

Simoeig M K Eio A Miceso a K Oae (199 "Ieaeic comemeaio a e suesso o oke ocus o osoia eeas comemeaio ewee suesso o oke oeis muae i iee egios" Genetics 1( 15-35

Sousa aa a E M ie (199 "Moe o e mecaism o aceioage 7 A asciio iiiaio a emiaio" J Mol Biol ( 319-3

Saic E ock K oue S E ee S A Cei C agigia G uee G Gie I Ko e a ( "e ioe ooki e moues o e ie scieces" Genome Res 1(1 111-

Sei ( "uma geome e o e egiig" Nature 31(711 915-

Seime E a A ow (3 "Ssu7 oei meiaes o oy(A-coue a oy(A-ieee emiaio o A oymeise II asciio" Mol Cell Biol 3(1 339-9

Sug S I Goo G ose E Geo S Ko Mumo Oucic Scee Sumes e a (3 "Eucaig uue Scieiss" Science 31 15

e C (3 Introductory Biostatistics. ooke ew esey o Wiey Sos Ic

aaska E a M Q ag (1999 "eecio o oyaeyaio sigas i uma A sequeces" Gene 31(1- 77-

akagaki Y a Maey (199 "A oyaeyaio aco suui is e uma omoogue o e Drosophila suppressor of forked oei" Nature 37(55 71-

akagaki Y a Maey (199 "ees o oyaeyaio aco Cs- coo IgM eay cai mA accumuaio a oe ees associae wi ce ieeiaio" Mol Cell ( 71-71

akagaki Y a Maey ( "Come oei ieacios wii e uma oyaeyaio maciey ieiy a oe comoe" Mol Cell Biol (5 1515-5

akagaki Y Maey C C Macoa Wius a Sek (199 "Α muisuui aco Cs is equie o oyaeyaio o mammaia e-mAs" Genes Dev (1Α 11-

akagaki Y C ye a Maey (199 "ou acos ae equie o 3-e ceaage o e-mAs" Genes Dev 3(11 1711-

akagaki Y Seie M eeso a Maey (199 "e oyaeyaio aco Cs- eguaes aeaie ocessig o IgM eay cai e-mA uig ce ieeiaio" Cell 7(5 91-5 131

ei S Waace essey Cegg Weaea a iggs (19 "e oyaeyaio sie muaio i e aa-goi gee cuse" Blood 71( 313-9

ia u ag a C S u (5 "A age-scae aaysis o mA oyaeyaio o uma a mouse gees" Nucleic Acids Res 33(1 1-1

ia a a Y ee (7 "Wiesea mA oyaeyaio ees i ios iicae yamic ieay ewee oyaeyaio a sicig" Genome Res 17( 15-5

uea a uea ( "Seia aaysis o gee eessio (SAGE uaeig e ioiomaics oos" Bioessays ( 91-

asamakis A Scek a C Awie (199 "Eemes useam o e AAUAAA wii e uma immuoeiciecy ius oyaeyaio siga ae equie o eicie oyaeyaio i io" Mol Cell Biol 1(9 399-75

ecuescu E ag ogesei a K W Kie (1995 "Seia aaysis o gee eessio" Science 7(535 -7

ekaaama K K M ow a G M Gimai (5 "Aaysis o a ocaoica oy(A sie eeas a iaie mecaism o eeae oy(Α sie ecogiio" Genes Dev 19(11 1315-7

ee C M Aams E W Myes W i Mua G G Suo O Smi M Mae C A Eas e a (1 "e sequece o e uma geome" Science 91(5 57 13-51

eai K G K Ai K Maicic Cug-Gase Wius a C Micaek (1 " iueces iig o a -kiοao suui o ceaage simuaio aco o mA ecusos i mouse ces" Mol Cell Bio121(4): 1-3

Wage A (5 Robustness and Evolvability in Living Systems, iceo Uiesiy ess

Wae E (1991 "A oe oy(A-iig oei acs as a seciiciy aco i e seco ase o messege A oyaeyaio" Cell ( 759-

Wickes M Aeso a ackso (1997 "ie a ea i e cyoasm messages om e 3 e" Curr Opin Genet Dev 7( -3

Woo S (3 "Eβ 5" om

iog W a E ee (3 "A osiie-eeack-ase isae memoy moue a goes a ce ae ecisio" Nature (95 -5 13

a a G Ma (5 "Comuaioa aaysis o 3-es o ESs sows ou casses o aeaie oyaeyaio i uma mouse a a" Genome Res 15(3 39-75

Yeo G ose G Keima a C uge ( "aiaio i aeaie sicig acoss uma issues" Genome Biol 5(1 7 Youg Α M Gue a C A Goss ( "iews o asciio iiiaio" Cell 19( 17-

auaya Μ I I M Koomies A oyaayo a M oou (3 "owseam eemes o mammaia e-mA oyaeyaio sigas imay secoay a ige-oe sucues" Nucleic Acids Res 31(5 1375-

ag u M ecce a ia (5 "oyA_ a aaase o mammaia mA oyaeyaio" Nucleic Acids Res 33 aaase Issue 1-

ag M Q ( "iscimia aaysis a is aicaio i A sequece moi ecogiio" Brief Bioinform 1( 331-

ag Η K A ee I ee C S esie a A Casi (3 "Sequece iomaio o e sicig o uma e-mA ieiie y suo eco macie cassiicaio" Genome Res 13(1 37-5

ao yma a C Mooe (1999 "omaio o mA 3 es i eukayoes mecaism eguaio a ieeaiosis wi oe ses i mA syesis" Microbiol Mol Biol Rev 3( 5-5

ao W a Maey (199 "Come aeaie A ocessig geeaes a ueece iesiy o oy(Α oymeise isooms" Mol Cell Biol 1(5 37-