"-0 1 E
ý W . I . ý . . . e ' ý "
ISOLATION AND CHARACTERIZATION OF PARTIAL FRAGMENT OF MITOCHONDRIAL GENOME (12S rRNA - ND2') OF RASBORA SARAWAKENSIS
ALICE LAU KUI YAN (46219)
Bachelor of Science with Honours (Resource Biotechnology) 2017 Isolation and Characterization of a Partial Fragment of Mitochondrial Genome (12S
rRNA - ND21 from Rasbora sarawakensis
ALICE LAU KUI YAN (46219)
A thesis submitted in partial fulfilment of the requirements for the degree of Bachelor of Science with Honours (Resource Biotechnology)
Supervisor: DR CHUNG HUNG HUI l
Bachelor of Science (Honours) Resource Biotechnology Department of Molecular Biology
Faculty of Resource Science and 'Technology University Malaysia Sarawak 2017 ACKNOWLEDGEMENT
First and foremost, I would like to thank God for granting me strength, ability and opportunity
to undertake and complete this research study. Without His blessing, this achievement would
not have been possible.
Next, I would like to express my thanks and endless appreciation to my supervisor, Dr.
Chung Hung Hui for his continued guidance and support throughout the project. I have been
extremely lucky to have a supervisor who cared so much about my work. He has been there
providing his valuable knowledge and advices at all times to ensure I always stay on the core of
my research. I would also like to thank Dr. Lee Kul Soon for their kindness to allow me to use
the machine in their laboratories.
I have great pleasure in acknowledging my gratitude to post-graduate students in Animal
Biotechnology Lab, Aimi and Leonard. Their countless guidance and encouragement had
greatly contributed to the completion of the thesis. Besides that, I would like to thank my lab
mates, Woan Ting, Chew Pei Ling, Syahril, Chung Pey Lin, Lee Ying, Vivien, Abigail and Fatin
for giving me various forms of assistance and motivation during the hard times. I would also
likVo express my gratitude to my friends, Zi Wei, Kai Jia, Caren, Chloe, Jinny, Jia Min, Jeffrey,
Kang Young, William and Woei Hann for their moral support and motivation.
My acknowledgement would be incomplete without giving a special thanks to my family
members: my dad Lau Hieng Yioung, any mum Shu How Ming, my sisters Agnes and Grace and my brother Andrew for their undying love. Their prayers for me were what sustained me thus far.
I UNIVERSITI MALAYSIA SARAWAK
Grade:
Please tick ( ) Final Year Project Report 0 Masters I PhD 0
DECLARATION OF ORIGINAL WORK
This declaration is the day 2017 made on ...... of......
Student's Declaration:
I Alice Lau Kui Van, 46219, Faculty of Resource Science and Technology hereby declare that the work entitled, Isolation and Characterization of Partial Fragment of Mitochondrial Genome (12S rRNA -ND21 of Rasbora sarawakensis is my original work. I have not copied from from any other students' work or any other sources with the exception where due reference or acknowledgement is made explicitly in the text, nor has any part of the work been written for me by another person.
I I /W'(7 elL-t- Date submitted Alice Lau Kui Yan (46219)
Supervisor's Declaration:
1, Dr. Chung Hung Hui hereby certify that the work entitled Isolation and Characterization Partial Fragment Mitochondrial Genome of of (12S rRiVA - ND21 of Rasbora by sarawakensis was prepared the aforementioned or above mentioned student, and was * fulfillment submitted to the "FACULTY" as a partial/full for the conferment of Bachelor of Science with Honours (Resource Biotechnology), and the aforementioned work, to the best of my kno%Nledge. is the said students work
Received for examination by: Date: (Dr. Chung Hung Hui)
11 I declare that Project/Thesis is classified as (Please tick (v)): ýJ CONFIDENTIAL (Contains confidential information under the Official Secret Act 1972)* RESTRICTED (Contains restricted information as specified by the done)* organisation where research was . OPEN ACCESS
Validation of Project/Thesis
I hereby duly affirmed with free consent and willingness declared that this said Project/Thesis shall be placed officially in the Centre for Academic Information Services with the abide interest and rights as follows:
" This Project/Thesis is the sole legal property of Universiti Malaysia Sarawak (UNIMAS). " The Centre for Academic Information Services has the lawful right to make copies of the Project/Thesis for academic and research purposes only and not for other purposes. " The Centre for Academic Information Services has the lawful right to digitize the content to be uploaded into Local Content Database. for " The Centre Academic Information Services has the lawful right to make copies if of the Project/Thesis required for use by other parties for academic purposes or by other Higher Learning Institutes. " No dispute or any claim shall arise from the student himself / herself neither a third party on this Project/Thesis once it becomes the sole property of UNIMAS. " This Project/Thesis or any material, data and information related to it shall not be distributed, published or disclosed to any party by the student himself/herself without first obtaining approval from UNIMAS.
Student's signature Supervisor's signature: (Date) (Date)
Current Address:
Notes: * If the Project/Thesis is CONFIDENTIAL or RESTRICTED, please attach together as annexure a letter from the organisation with the date of restriction indicated, and the reasons for the confidentiality and restriction.
[The instrument was prepared by The Centre for Academic Information Services]
III Table of Contents
Content Page
Acknowledgement ...... I Declaration II ...... Table Contents IV of ...... List Abbreviations of ...... VI List Figure of ...... VII List Tables VIII of ...... Abstract IX ...... 1.0 Introduction 1 ...... 1.1 Background 1 ...... 1.2 Objectives ...... 3 2.0 Literature Review 4 2.1 Mitochondrial genome ...... 4 2.2 Phylogenetic analysis ...... 7 2.3 Taxonomy classification ...... 10 2.4 Ciprinidae family ...... 11 2.5 Rasbora genus 13 ...... 2.6 Rasbora Sarawakensis ...... 14 3.0 Materials Methods and ...... 16 3.1 Sample collection and preparation 16 ...... 3.2 Total DNA from Rasbora 16 extraction adult sarawakensis ...... 3.3 Agarose gel electrophoresis ...... 17 3.4 Primer design 18 ...... 3.5 Gradient PCR ...... 19 3.6 Purification PCR amplified 20 of products ...... 3.7 DNA Sequencing and sequence analysis 21 ...... 3.8 Phylogenetic analysis ...... 21 4.0 Result 23 ...... 4.1 Total DNA extraction ...... 23 4.2 Primer design and synthesis ...... 24 4.3 Temperature by PCR 25 optimization of primers gradient ...... IV 4.4 DNA 28 purification ...... 4.5 Sequence 29 analysis ...... 4.6 Phylogenetic tree 32 ...... 5.0 Discussion 33 ...... 5.1 Sequence 34 analysis ...... 5.2 Phylogenetic analysis ...... 37 6.0 Conclusion ...... 39 7.0 Reference 40 ......
Appendix A ...... 45 Appendix B ...... 49 Appendix C ...... 50 Appendix D ...... 51 Appendix E ...... 52 Appendix F ...... 53 Appendix G ...... 54 Appendix H ...... 57 Appendix I ...... 58
V List of Abbreviations
ATP Adenosine triphosphate
BLAST Basic Local Alignment Search Tool
bp Base pair
CO! Cytochrome oxidase subunit I
CSB Conserved Sequence Blocks
D-loop Displacement loop
DNA Deoxyribonucleic Acid
mL Mililitres
NADH Nicotinamide adenine dinucleotide
NCBI National Center for Biotechnology Information
ND I NADH dehydrogenase subunit 1
PCR Polymerase Chain Reaction spp species rRNA Ribosomal Ribonucleic Acid
T,,, Melting temperature
Taq Thermus aquaticus tRNA Transfer Ribonucleic Acid
UV Ultraviolet pL Microlitres
VI List of Figure
Figure Page
2.1 General mitochondrial DNA structure with gene and regulatory regions 6 labelled. (Adapted from Kyriakouli et a!., 2014).
2.2 The principle of Neighbour-joining method is to find pairs of operational 9 taxonomic units (OTUs) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. (Adopted from Saitou & Nei, 1997).
2.3 The use of maximum parsimony to choose between these two possible 9 trees. In tree A, there has only one change is needed to explain the pattern of nucleotides at a particular sequence position while tree B has two changes in nucleotides. The parsimony method would choose the tree A as it provides a simpler explanation of evolution relationship. (Adopted from Vigilant et al., 1991).
2.4 In maximum likelihood method, probabilities are considered for every 9 individual nucleotide substitution in a set of sequence alignment. (Adopted from Cawley, 1998).
2.5 Some of the example of economically important cyprinids fishes found 12 in Southern Asia. (Adapted from Google source. (n. d. ). Retrieved August 4, 2016 from https: //www. google. com/search? q=cyprinidae& source=lnms&tbm=isch &sa=X&ved=OahUKEwjgjJ_js9XTAhWMs48 KHRAnCxUQ_AUICiB&biw=1242&bih=557#imgrc=xLwHM-mY6R TCAM)
2.6 Some of the example of Rasbora species. (Adapted from Google source. 13 (n. d. ). Retrieved August 4, 2016 from https: //www. google. com/search? biw=1242&bih=5 5 7&tbm=isch&sa= l &q=Rasbora+genus&oq=Rasbora+genus&gs_l=img. 3... 20199854.2019 9854.0.20200637.1.1.0.0.0.0.122.122. Oj1.1.0.01 c. 1.64. img.. 0.0.0. rXMx HibKOIw
2.7 Rasbora sarau'akensis from Sungai Sebat, Sarawak, Malaysia (Adapted 15 from Rasbora sarawakensis, 2004).
4.1 Agarose gel electrophoresis result after total DNA extraction. Lane L 23 was loaded with lkb DNA ladder (Promega, USA) while the Lane I and 2 were loaded with DNA sample extracted.
4.2 Agarose gel electrophoresis result from gradient PCR of the five 26 fragments. Lane L was loaded with 1kb DNA ladder (Promega, USA). Lane 4 represent the negative control. (A) Lane 1, 2 and 3 represent PCR amplicon produced in 54.0°C, 52.6°C, 49.7°C respectively. The amplicon VII size were close to the expected size 963 bp. (B) Lane 1, 2 and 3 represent PCR amplicon produced in 53.8°C, 50.3°C, 48.6°C respectively. The amplicon size were close to the expected size 1286 bp. (C) Lane 1, 2 and 3 represent PCR amplicon produced in 53°C, 51.5°C, 49.3°C respectively. The amplicon size were close to the expected size 941 bp.
4.3 Agarose gel electrophoresis result after DNA purification of the five 27 fragments. Lane L was loaded with 1kb DNA ladder (Promega, USA). (A) Lane 1 represent purified DNA product at optimum temperature 54.5°C with expected amplicon size 963 bp. (B) Lane 1 represent purified DNA product at optimum temperature 50.3°C with expected amplicon size 1286 bp. (C) Lane 1 represent purified DNA product at optimum temperature 53.0°C with expected amplicon size 941 bp.
4.4 Five fragments were trimmed together to generate a long continuous 30 sequence for phylogenetic tree construction.
4.5 The dendrogram of the phylogenetic tree constructed using maximum 32 likelihood criterion.
5.1 Five sets of primers designed that provide overlapping region with one 33 another.
VIII List of Tables Table Page
3.1 Rasbora spp and their GenBank accession number 18
3.2 List of PCR reaction mix reagent components. 19
3.3 The parameters of PCR reaction 18
3.4 The Rasbora spp chosen for phylogenetic analysis and their 22 GenBank accession number.
4.1 Five sets of designed forward and reverse primers 24
4.2 Summary of Nucleotide BLAST of five fragments and the trimmed fragment (12S complete partial of mitochondrial genome rRNA - 29 ND2 ') of R. sarawakensis. 4.3 Summary of genes characteristic that encoded in the partial fragment 29 (12S of mitochondrial genome rRNA - ND2) of R. sarawakensis.
IX Isolation and Characterization of Partial Fragment of the Mitochondrial Genome (12S rRNA -ND2') of Rasbora sarawakensis Alice Lan Kul Yan
Resource Biotechnology Faculty of Resource Science and Technology Universiti Malaysia Sarawak
ABSTRACT
Rasbora sarawakensis is popular ornamental cyprinid fish that native to Borneo. However, the lack of diagnostic morphological characteristic has hindered the identification of this species among Rasbora genus. Sequence analysis of mitochondrial genome is a fast and accurate taxonomy method to identify and classify them. In this study, the partial fragment of mitochondrial genome (12S rRNA- ND2 ') of R sarawakensis was isolated and analysed for the characterization of R sarawakensis. Mitochondrial genome is used instead of nuclear genome because it can trace the direct maternal ancestor of R. sarawakensis as mitochondrial genome is maternally inherited. The aim of this study is to isolate and characterize partial fragment of the mitochondrial genome (12S rRNA - ND2 ) of R sarawakensis. Total genomic DNA from R sarawakensis was isolated through CTAB method. Five pairs of gene specific primers were designed based on the conserved gene sequences from multiple sequence alignment of three Rasbora spp for primer walking approach. The size of the five short fragments amplified were around 1000 bp. Trimmed sequencing result and BLASTn analysis shown this fragment (12S rRNA- ND2 ) have high similarities of 87% with R trilineata. The phylogenetic analysis father strengthen the point that R sarawakensis shared close ancestral relationship with R trilineata in maximum likelihood tree. It is important for father study in biodiversity assessment of Rasbora spp in order to regulate international trade of these ornamental fish in the fixture.
Key words: Mitochoadrial genome, Rasbora saraiwekensLs gradient PCR, phylogeaetic tree
ABSTRAK
Rasbora sarawakensis adalah popular ikan cyprinid hiasan yang berasal Borneo. Walaupun begitu, kekurangan ciri morfologi diagnostic telah menghalang pengenalpastian spesies sni dalam Rasbora genus. Analssis sequence genom mitokondria ada/ah kaedah taksonomi yang cepat dan tepat dalam mengenalpasti ini. Dalam kajian dam mengklas>#kasskan speisies ins, sebahagian genom mitokondria (12S rRNA - ND2 ) dari R.: ehora sarawakensis telah dlasing dan dianalisis untuk pencirian R. sarawakensis. Genom mitokondria tdah diguna dan bukannya genom nuklear kerana genom mitokondria dapat menyurih moyang ibu terus $ WUwa&=& disebabkan oleh genom mitokondria ada/ah diwarssi dari keturunan ibu. Tujuan kajian ini ia/ah untuk mengasingkan dan mencirikan sebahagian genom mitokondria (12S rRNA - ND2 ) R samwakensis. Kesduruhan genomik darf R sarawakensis telah diasingkan mdalus kaedah CTAB. Lsma pasang primer telah direka berdasarkan gen sequence terpelihara melalul "multiple sequence alignment" antara tiga spesies Bagk=untukkaedah ' primer walking". r elsma-limapraduck PCR yang digandakan besar mempunyai salz 1000 bp. Keputusan dan BLASTn bahagian berhampiran sequencing analisis atas genom ini (12S rRNA - ND2 ) mempunyas kadar homologi sebanyak 87% dengan R. trilineata. Analisis filogenetik mengutuhlan hubungan evolusi dekat R sarawakensis dengan A. trilineata dalam pokok "maximum likelihood". Hal ini biodiversiti penting bagi kajian lanjut da/am penilaian spesies-BaI&M untuk mengawal perdagangan antarabangsa ikan hiasan ini pada maea depan.
Kata kawci: Geaom mitokondria, Rasbora saiawakewsk gradient PCR, pokok tiiogeaetik
X 1.0 Introduction
1.1 Background
Rasbora sarawakensis is a species of ray-finned fish in family Cyprinidae which is the
largest known freshwater fish family. It belongs to subfamily Rasborinae which is a diverse
group that is probably best known because it include the model organism, the zebrafish, as
well as many popular aquarium species (Mayden, 2008). Rasboras are perfect ornamental
fishes as they have beautiful appearance, elegant and peaceful. According to Department of
Fisheries Malaysia (2011), there has increased global demand for ornamental fish exported
from Malaysia, especially Singapore, Russia, USA and Europe. Among these ornamental
fish, Rasboras took up 24% of the market. This situation might poses potential threats to
these freshwater fish. Hence, it is necessary to identify and classify the species for the further
study in biodiversity assessment of Rasbora spp.
Previously, mitochondrial genome had been identified to be a more reliable approach
in species classification compared to morphological analysis to delineate species
(Dasmahapatra & Mallet, 2006). Mitochondrial genome is the genetic material which is
smaller in size than nuclear genome, with about 16,500 base pairs (Taanman, 1999). The
mitochondrial genome contains 37 genes which provide instruction for normal
mitochondrial functions. Thirteen of these genes encodes 13 protein subunits of enzymes
involved in oxidative phosphorylation. The remaining genes encode for 22 transfer RNAs
and 2 ribosomal RNAs of mitochondria) ribosome which are necessary in translation of
proteins encoded by mitochondrial genome (Taanman, 1999).
Since mitochondria) genome has smaller size, reduced gene content and compact nature, it provide straightforward genome assemblies. It is very commonly used in the study of phylogenetic as the inheritance mode of mitochondrial genome is maternal (Castro et al.,
I 1998). It can trace the direct maternal ancestor of certain organism and provide evidence in
phylogenetic relationship. According to Shao et al. (2002), only molecular identification
can guarantee the identification of fish eggs to species level.
The amount of potential gene arrangements in mitochondrial genome are large
enough to make different taxa adopt their own identical state. Thus, comparison of
mitochondrial gene arrangement provide significant insights in evolution of organisms in
order to identify the species and contribute to biodiversity level (Boore, 1999). Furthermore,
this molecules can be used in population genetic studies. In 1984, Avise and Saunders
presented the application of mitochondrial DNA in study of populational dynamics of
different mitochondrial haplotypes in Lepomis species (freshwater sunfish). According to
recent studies, DNA barcoding of sequences in mitochondrial gene had been applied in many
animals such as birds (Paul et al., 2004), skipper butterflies (Hebert et al., 2004), as well as
more than 8,000 species of fishes (Ratnasingham & Hebert, 2007).
However, in Malaysia, there is lack of molecular studies to identify and classify on
fish taxonomy and no specific phylogenetic relationship regarding Rasbora species has been
published. The homogenous appearance of R sarawakensis with some other Rasbora spp.
increase the difficulty in differentiation of these species. This may cause a confusion on
classification of the Rasbora spp which lead to poorly regulated international trade in these
ornamental fish and eventually poses risks to both biodiversity and economy activities via
invasive alien species. Hence, the aim of this research is to construct phylogenetic relationship of R. sarawakensis with other Rasbora spp. based on the mitochondrial genomic sequences in order to deal with these problems.
2 1.2 Objectives
The objectives throughout the completion of this project are:
1. To isolate and analyse the DNA sequence of the partial fragment of mitochondrial
genome (12S rRNA- ND2') of R. sarawakensis
2. To understand the phylogenetic relationship of R. sarawakensis with other Rasbora
species
3 2.0 Literature Review
2.1 Mitochondrial genome Mitochondrial genome encode for proteins specific for themselves and enable them to
replicate by themselves. According to endosymbiosis theory, mitochondria arose from
bacteria-like cells that were taken in and assimilated by eukaryotic cells then eventually
became organelles in the eukaryotic cells. This theory had been proved through modem
sequencing technique (Taanman, 1999). The analysis of rRNA sequences showed
mitochondrial genome are different from eukaryotic host cell nuclear genome and it is
closely resemble eubacterial genomes (Castro et al., 1998).
DNA in mitochondria was first detected in late 1963 from highly purified yeast by
biochemical procedures (Ernster & Schatz, 1981). In 1966, van Bruggen et al. had indicated
that mitochondrial DNA from higher eukaryotes is circular double-stranded molecule which
has different size in different species (Ernster & Schatz, 1981). The parasite Plasmodium falciparum has the smallest mitochondrial genome sequenced to date, with 5,967 base pair
(Conway et at, 2000) while the largest mitochondrial genome sequenced to date is 366,924
base pair from the plant Arabidopsis thaliala (Unseld et at, 2000).
Mitochondrial genome is more commonly used in DNA barcoding than nuclear
genome because mitochondrial gene lack of introns and is generally inherited as a haploid
from breeding females (Hebert et al, 2003). This is because the gamete cells from male have
mitochondria mostly located at the tail to generate energy for sperms locomotion and the tail will leave outside the ovum before fertilization take place. As mitochondrial genome is maternally inherited, the recombinant in mhocbýondrialgnome is limited Hence, compared to nuclear genome with biparental inheritance, the mixhondrial gene can be expected had shared the same underlying genealogy among species (Luo et al., 2011). Furthermore, mitochondrial genome have. higher evolutionary rate, that is, 5 to 10
times than nuclear genome (Castro et at., 1998). This is due to the mitochondrial genome
do not have DNA damage repair enzymes to recognize and remove mis-incorporated bases
lead to higher rate of single mutations (Paul et at., 2004). The mutation take place in coding
sequences and control region caused it evolved rapidly but the spatial arrangement of genes
and genome size are constant among species (Castro et al., 1998). The high mutation rate
lead to higher degree of variability between individuals (Castro et al., 1998). Besides that,
it have higher copy numbers than nuclear genome which can up to 10 copies number in one
mitochondria (Phillips et al., 2014). These characteristics of mitochondrial genome have
advantage over nuclear genome in routine amplification through polymerase chain reaction
as well as used as a molecular marker (Taanman, 1999).
Generally, the circular mitochondrial genome have asymmetric distribution of
guanine and cytosine permits the separation of the genome into heavy strand and light strand.
The heavy strand encoded the majority of the genes, that are 2 rRNAs, 14 tRNAs and 12
polypeptiides. The remaining 8 tRNAs and one polypeptide are coded in light strand (Morass
et al., 2002). Apart from the regulatory region containing promoter and origin of heavy
strand replication, the genomehave very few non-coding intergenic regions.
Mitochondrial genome have different genetic code with that in nuclear genome. For instant, TGA is a termination codon in nuclear genome b it code for tryptophan in mitochondrial genome. Mitoehondrial genome cocain displacement loop (D-loop) region which is a three-stranded structure (Taanman, 1999). The majority region in D-loop comprised of Conserved Sequel ce Blocks (CSB) I, CSBII andCSBIII (Morass et al., 2002),
As these CSBs are located in the D-loop region and the CSM are located new to the initiation site for heavy strand DNA replication, they are proposed to be involved in nvtochoadrial
DNA replicatrom The are 2'rRNAt; 22 t1CMs and 13`.,protein `submits eneo&d in mitochondrial genome. The two rRNA species are 12S and 16S respectively. The 22 tRNAs
are sufficient to translate all the 13 protein subunits (Moraes et al., 2002).
The 13 protein subunits formed the complex proteins that involved in respiratory
chain and ATP synthesis, which are protein complex I, III, IV and V. There are 7 subunits
involved in NADH dehydrogenase (protein complex I) formation, ND1, ND2, ND3, ND4,
ND4L, ND5 and ND6. The subunit in protein complex III is cytochrome b. There are 3
subunit involved in cytochrome c (protein complex IV) formation, COI, COII and COIII.
ATP 6 and ATP 8 are the subunits involved in adenosine triphosphate synthase (protein
complex V). The position of gene encoded for these protein subunits in mitochondrial
genome are indicated as shown in Figure 2.1.
The alignment of mitochondrial DNA sequences of organism with other species gene
sequences is the first step in comparative mitochondrial genomics analysis. The level of
conserved sequence across different species can be determined through analysis of
nucleotide composition such as order of gene in order to investigate the evolution of the
genomes (Wei et al., 2002).
. j . r7S I -' , 3.:.a 0 *_! ýryr n tý1' f6S ý+HE3. "r _, "ý iRNA t " . 0 .! ^if. 1 ý L , r VfJS f "" : Mi1«: Fnndrini DNA ADý ý"ý ý i . . v_,: 'r
I Np. r! 0 . Ab une 001 r IP ý ' ý GO/if C'N! /TPj sc
Figure 2. 1. General mitochondrial DNA structure with gene and regulatory regions labelled. (Adapted from Kyriakouli et al., 2014). 6 mitochondrial genome. The two rRNA species are 12S and 16S respectively. The 22 tRNAs
are sufficient to translate all the 13 protein subunits (Moraes et al., 2002).
The 13 protein subunits formed the complex proteins that involved in respiratory
chain and ATP synthesis, which are protein complex I, 111, IV and V. There are 7 subunits
involved in NADH dehydrogenase (protein complex I) formation, ND1, ND2, ND3, ND4,
ND4L, ND5 and ND6. The subunit in protein complex III is cytochrome b. There are 3
subunit involved in cytochrome c (protein complex IV) formation, COI, COII and COIII.
ATP 6 and ATP 8 are the subunits involved in adenosine triphosphate synthase (protein
complex V). The position of gene encoded for these protein subunits in mitochondrial
genome are indicated as shown in Figure 2.1.
The alignment of mitochondrial DNA sequences of organism with other species gene
sequences is the first step in comparative mitochondria] genomics analysis. The level of
conserved sequence across different species can be determined through analysis of
nucleotide composition such as order of gene in order to investigate the evolution of the
genomes (Wei et al., 2002).
!I, 411."
(y/h ` _.. -, ,. _i. - . 1 _u " . vr. I roL.t vm f ý M Irrhondriid DI44 " . nu: f 46 VIM f' i.. . . 1 as
ND*t 0 0 . 1 K y , ND; 3 ib aw, GC,,, r C.'a4 lTP,:: c
Figure 2.1. General mitochondrial DNA structure %%ithgene and regulatory regions labelled. (Adapted from KvTiakouli ci at.. 2014). 6 2.2 Phylogenetic analysis
Phylogenetic tree had been used in evolutionary biology since 1837 by Charles
Darwin (Gregory, 2008). Phylogenetic tree is constructed to represent the evolutionary
relationship descent of different species from a common ancestor. It shows the relatedness
among organisms and is a convenient method to study phylogenetic relationships between
species. It provide a rigorous framework to guide biological research that involved
evolutionary history (Avise, 2006). DNA sequence alignment had been used to investigate
the similarities and differences between species. The sequences may correspond to
functional, structural or evolutionary relationship between species.
There has two tree-building method which are distance-based method and character-
based method. Distance-based method compute pairwise distances that determined by the
tree topology and then discard the actual data. This method only use the fixed distance to
build phylogenetic tree (Baxevanis & Ouellett, 2001). On the other hands, character-based distribution method build tree by optimizing the of the actual data patterns for each character.
Neighbourjoining is the most commonly applied distance-based method while the
maximum parsimony and maximum likelihood are the main character-based method used in
phylogenetic analysis (Baxevanis & Ouellett, 2001).
The neighbourjoining algorithm work on step-by-step building procedures, It based distance regardless of the optimization criterion. estimates the tree on matrices
(Saitou & Nei, 1997). This method is comparatively rapid. The fully resolved tree is formed by inserting branches between from a fully unresolved "star" tree successively a pair of in The closest neighbour and the remaining terminals the tree. closest neighbour pair is then consolidated, effectively reforming a star tree and the process is repeated until only one terminal is remained (Saitou & Nei, 1997) (Refer to Figure 2.2). It does not permit global
7 alignment to eliminate negative branches because it does not imply the assumption that the
evolution of all lineages happen at the same rate (Baxevanis & Ouellett, 2001).
Maximum parsimony is an optimization criterion that adheres to the principle that
the best explanation of the data are provided in a shortest tree, the one with the fewest
changes (Huelsenbeck, 1995). It minimize the branch length by reducing the amount of
mutation to a minimum level. In maximum parsimony algorithm, two species are related
genetically if they tend to share the same nucleotide at similar position (Refer to Figure 2.3).
However, this method performs poorly when there is substantial among-site rate
heterogeneity (Huelsenbeck, 1995). One approach that can be used to fix this problem is to
little modify the data set to include only sites that exhibit or no heterogeneity. This method
tend to yield numerous trees that have the same score.
Maximum likelihood method works under evolutionary model which has the highest
likelihood of producing the observed data (Baxevanis & Ouellett, 2001). This method is
derived for each base position in an alignment. The likelihood is calculated in term of
probability of the variation pattern that would be produced at a site, given a particular tree
and the overall observed base frequencies (Baxevanis & Ouellett, 2001). The likelihood becomes the sum of the probabilities of each possible reconstruction of substitutions under a particular substitution process (Refer to Figure 2.4). The substitution model should be data. Maximum likelihood best optimized to fit the observed tree tend to obtain a single tree due to their calculation that involved division and decimals, whereas maximum parsimony merely counts discrete steps (Baxevanis & Ouellett, 2001). Hence, maximum likelihood is best optimized for constructing a phylogeny from sequence data input.
8 <; C ý
Fl.
Figure 2.2. The principle of Neighbour-joining method is to find pairs of operational taxonomic units (OTUs) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. (Adopted from Saitou & Nei, 1997).
5anit)h, 1 1 3 I Si I i 4 3 Cbservatiur G (i T T G G C-I I
A
Figure 2.3. The use of maximum parsimony to choose between these two possible trees. In tree A, there has only one change is needed to explain the pattern of nucleotides at a particular sequence position while tree B has two changes in nucleotides. The parsimony method would choose the tree A as it provides a simpler explanation of evolution relationship. (Adopted from Vigilant el al.. 1991). V A
,00,
I=nure 2.4. In maximum likelihood method, probabilities are considered for every' individual 0ucleotidc substitution in a set of sequence alignment. (Adopted from Cawley, 1998).
9 2.3 Taxonomy classification
Taxonomy is the study of the classification of organisms to species level based on their
common structure, functions and evolutionary history. Taxonomy classification is a
hierarchical system, which constituting kingdom, phylum, class, order, family, genus and
species to classify the living things in a hierarchy groups of organisms. (Godfray & Knapp,
2004). Phylogeny is a useful tool for taxonomists to study the evolutionary development of
organisms and relationships between them.
Organisms classed in same species are capable of interbreeding to produce fertile
offspring. Species are the basic unit of classification which identified by binomial
nomenclature. This binomial classification system proposed by Carl Linnaeus classified all
living things by sorted organisms with a genus, species classification (Luketa, 2012).
Taxonomists classify two species together in a genus and genera with similar characteristics
brought together formed a family. Further, related families are classified in an order and
orders are grouped in a class. Related classes are brought together formed a phylum. This
classification scheme brought various species together into a progressively larger groups.
Moving from Kingdom to species, the organisms are more closely related (Godfray
& Knapp, 2004). For example, Puntigrus tetrazona (Sumatrana barb) and Rasbora
sarawakensis are from same family Cyprinidae but different subfamily. Danio rerio and R. because in sarawakensis are more closely related to each other they are same family
Cyprinidae and same subfamily Danioninae, compared to Puntigrus tetrazona which in identify subfamily Barbinae. Systematic operates to the species and investigate the evolutionary relationship among species (Godfray & Knapp, 2004).
Taxonomic groups can be used to depict evolutionary relationships of species and construct phylogenetic tree. According to Hodge (2006), Charles Darwin proposed 10 evolution theory stated that all modern species are arisen from earlier species and all these
organisms shared a common ancestry. Combination of Linnaeus's and Charles Darwin's
works became the organizing principle and formed foundation of modem taxonomy (Luketa,
2012).
2.4 Cyprinidae family
divided into The order Cypriniformes are traditionally two superfamilies, Cyprinioidea and
Cobitioidea (Mayden, 2008). The family Cyprinidae is the largest of all known freshwater
fish families. All members of this family are termed as cyprinids (Refer to Figure 2.5). They
are native to North America, Africa and Eurasia, which is the combined continental landmass & Doadrio, of Europe and Asia (Zardoya 1999). According to Nelson (2006), over 2,420 had species of cyprinids and about 220 genera been recognized. The 220 genera is estimated
to be more than eight percent of the world's known fishes. According to current molecular from fossil and phylogenetic data obtained evidences, cyprinids may have originated in Asia
in the Eocene (Zardoya & Doadrio, 1999). The cyprinids are primary freshwater fishes and
they cannot tolerate high salinity environment. Although there are a few species that spend
brackish for Rutilus part of their life time in water, example rutilus, they cannot reproduce (Meri in it especially when pH of water is below 5.5 et al., 2008).
Cyprinids are characterized by their toothless jaws and palate (Mayden, 2008). They
in chew their food by using pharyngeal teeth their throat which the pharyngeal teeth number
as an important characteristic in distinguishing species. As a member of superorder
Ostareophysi, cyprinids have Weberian organ which is set of bonny ossicles physically connect inner ear to swim bladder and thus amplify the sound waves from greater range of auditory stimuli (Mayden, 2008). Cyprinids usually have thin lips, large scales, absence of
11