LAB MANUAL BIOPERL PROGRAMMING LANGUAGE

DBT STAR STATUS Department of BCA

Published by

Coimbatore Institute of Information Technology #156, 3rd Floor, Kalidas Road, Ramnagar, Coimbatore – 641009, Tamil Nadu, India. Website: www.ciitresearch.org

Complimentary Copy All Rights Reserved. Original English Language Edition 2020 © Copyright by Coimbatore Institute of Information Technology.

This book may not be duplicated in any way without the express written consent of the publisher, except in the form of brief excerpts or quotations for the purpose of review. The information contained herein is for the personal use of the reader and may not be incorporated in any commercial programs, other books, database, or any kind of software without written consent of the publisher. Making copies of this book or any portion thereof for any purpose other than your own is a violation of copyright laws.

This edition has been published by Coimbatore Institute of Information Technology, Coimbatore.

Limits of Liability/Disclaimer of Warranty: The author and publisher have used their effort in preparing this Bioperl Programming Language book and author makes no representation or warranties with respect to accuracy or completeness of the contents of this book, and specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. There are no warranties which extend beyond the descriptions contained in this paragraph. No warranty may be created or extended by sales representatives or written sales materials. Neither CiiT nor author shall be liable for any loss of profit or any other commercial damage, including but limited to special, incidental, consequential, or other damages.

Trademarks: All brand names and product names used in this book are trademarks, registered trademarks, or trade names of their respective holders.

ISBN 978-93-891053-7-7

This book is printed in 70 gsm papers. Printed in India by Mahasagar Technologies.

Coimbatore Institute of Information Technology, #156, 3rd Floor, Kalidas Road, Ramnagar, Coimbatore – 641009, Tamil Nadu, India. Phone: 0422-4377821 www.ciitresearch.org

ii Complimentary Copy

Authors

Dr. M. S Vijaya K. Geethalakshmi

S. Mohanapriya M. Selvanayaki

T. S. Anushya Devi T. Saranya

A. Kavitha L. Sheeba

G. Sangeetha S. Kavitha

iii Complimentary Copy Notes **********

iv Complimentary Copy

Dr. R. SUBASHKUMAR, M.Sc., Ph.D., DMLT., PGDNBT., Associate Professor, Department of Biotechnology, Sri Ramakrishna Arts & Science College, (Autonomous, Affiliated to Bharathiar University) Coimbatore, Tamil Nadu. India.

The Lab Manual of Bioperl Programming Language is a collection of fourteen titled modules that facilitate the development of scripts, its use in the analysis of biological data, exploring approaches, emerging methodologies, and tools that can give biological meaning to the data generated. The manual has been prepared with prioritised methodology viz., operation of tools using the suitable methods, annotated algorithms, self- descriptive results, etc. I hope that the practical manual will also be of interest to computational biologists who want to learn and practice a little more about the biological questions related to the field of . This book is intended to ease the learning curve for new users of bioperl, as well as a user guide for a specific set of DNA and protein sequence analysis programs and enables developing scripts that can analyze large quantities of sequence data in ways that are typically difficult or impossible with web based systems. The book has been carefully reviewed by the team experts and highly appreciated to fulfil the requirement of the DBT STAR College Scheme.

v Complimentary Copy Notes **********

vi Complimentary Copy

Mr. Vivek Chandramohan, Assistant Professor, Department of Biotechnology, Siddaganga Institute of Technology, Tumkur-572103, Karnataka, India.

I have gone through the entire manual of “Bioperl Programming Language” lab. This manual is written very well and the concepts are very clear. The Bioperl is used for many application, in this manual deals with, BLAST, Multiple sequences alignment, prediction algorithm, and some mathematical model. Finding mutation in given sequences is most impotent clinical genomics.

vii Complimentary Copy Notes **********

viii Complimentary Copy Introduction to Bioperl: Bioperl is a collection of perl modules that facilitate the development of perl scripts for bio-informatics applications. It provides reusable perl modules that facilitate writing perl scripts for sequence manipulation, accessing of databases using a range of data formats and execution and parsing of the results of various molecular biology programs including Blast, clustalw, TCoffee, Genscan, ESTscan and HMMER. Bioperl enables developing scripts that can analyse large quantities of sequence data in ways that are typically difficult or impossible with web based systems. Using Bioperl: Bioperl provides software modules for many of the typical tasks of bioinformatics programming. These include:  Accessing sequence data from local and remote databases  Transforming formats of database/ file records  Manipulating individual sequences  Searching for "similar" sequences  Creating and manipulating sequence alignments  Searching for and other structures on genomic DNA  Developing machine readable sequence annotation Installation: The actual installation of the various system components is accomplished in the standard manner:  Locate the package on the network  Download  Decompress (with gunzip or a simliar utility)  Remove the file archive (eg with tar -xvf)  Create a ‘‘makefile’’ (with ‘‘perl Makefile.PL’’ for perl modules or a supplied ‘‘install’’ or  ‘‘configure’’ program for non-perl program  Run ‘‘make’’, ‘‘make test’’ and ‘‘make install’’ This procedure must be repeated for every CPAN.

ix Complimentary Copy  Module, Bioperl-extension and external module to be installed. A helper module CPAN.pm is Available from CPAN which automates the process for installing the perl modules.

x Complimentary Copy BIOPERL PROGRAMMING LANGUAGE

Syllabus:

1. To align the sequence using Local Blast. 2. Write a script to search for genes from Genscan. 3. Protein Sequence Generation. 4. To count start and stop codons in a sequence. 5. To calculating the reverse complement of DNA Sequence. 6. To concatenating DNA Fragments Transcription: DNA to RNA. 7. Program to simulate DNA Mutation. 8. Write a code to check file extensions using Perl. 9. Write a code to search the given string in an entered sequence. 10. To compute total and average length of the proteins in these files. 11. Reading a protein sequence from a file. 12. Write a sequence in a file.

xi Complimentary Copy Notes **********

xii Complimentary Copy List of Experiments

S. No Programs Page No

01 Align the Sequence using Local Blast 1

02 Gene from Genscan 5

03 Protein Sequence Generation 7

04 To Count Start and Stop Codons in a Sequence 11

05 Reverse Complement of DNA 13

06 DNA Fragments Transcription 16

07 DNA Mutation 21

08 To Check File Extensions 27

09 To Search String in Sequence 29

10 Total and Average length of Proteins in files 31

11 Reading Protein Sequence From Files 37

12 Sequence into a file 43

xiii Complimentary Copy Notes **********

xiv Complimentary Copy BioPerl Programming Language  1

1. ALIGN THE SEQUENCE USING LOCAL BLAST Prerequisites: NCBI, BLAST Tools. Aim: To write a program to align the sequence using local blast. Steps Using Tools: 1) Open the NCBI website. 2) Choose the BLAST under popular resources. 3) According to the chosen sequence choose Protein BLAST or Nucleotide BLAST. 4) Enter the sequence. 5) Choose Others in Database option. 6) Choose Nucleotide in the following option. 7) Choose BLAST option to view the output.

Department of BCA Complimentary Copy 2  BioPerl Programming Language

Department of BCA Complimentary Copy BioPerl Programming Language  3

Department of BCA Complimentary Copy 4  BioPerl Programming Language

Department of BCA Complimentary Copy BioPerl Programming Language  5

2. GENE FROM GENSCAN Prerequisites: Perl Software, Notepad.

Aim: To write a script to search for genes in the Genscan.

Algorithm: Step 1: Start the process. Step 2: Open programs folder in Perl Folder. Step 3: Open Notepad and type the program. Step 4: Save the program as sel.pl in the folder which is created in perl. Step 5: Run the program in Command Prompt as perl sel.pl. Step 6: Check the output in the perl folder. Step 7: Stop the program. Program: use Bio::Seq; use Bio::SeqIO; $seq_obj = Bio::Seq-> new (-seq=>"aaaatgggggggggggccccgtt", -display_id => "#12345", -desc => "example 1", -alphabet => "dna"); $seqio_obj = Bio::SeqIO->new(-file => '>perlseq1.fasta', -format => 'fasta' ); $seqio_obj->write_seq($seq_obj);

Output: C:\>cd perl64 C:\Perl64>cd programs C:\Perl64\Programs>perl sel.pl C:\Perl64\Programs>

Department of BCA Complimentary Copy 6  BioPerl Programming Language

Department of BCA Complimentary Copy BioPerl Programming Language  7

3. PROTEIN SEQUENCE GENERATION Prerequisites: Expasy and UniProtKB Tools Aim: To write a program for generating protein sequence. Steps Using Tools: 1. Search Expasy tool in the Google. 2. Click the Proteomics column. 3. Under the databases, click UniProtKB link. 4. Copy the protein sequence in the text file.

Department of BCA Complimentary Copy 8  BioPerl Programming Language

Department of BCA Complimentary Copy BioPerl Programming Language  9

Department of BCA Complimentary Copy 10  BioPerl Programming Language

Department of BCA Complimentary Copy BioPerl Programming Language  11

4. TO COUNT START AND STOP CODONS IN A SEQUENCE Prerequisites: Perl Software, Notepad Aim: To write a program to count the stop and start codons in the sequence Algorithm: Step 1: Start the process Step 2: Start -> all programs -> notepad Step 3: Enter and store the input sequence Step 4: Ask the user to start the start and stop codon Step 5: Use for and if loop to find the start and stop codon Step 6: Print the start and stop codon value Step 7: Using command prompt run the output Step 8: Stop the process Program: #!/usr/bin/perl print "Enter sequence:\n"; #Asking user to enter input sequence $seq = ; chomp($seq); $seqlen=length($seq); $startcod=0; $stopcod=0; print "Enter start codon or stop codon to find:\n"; #Asking user to enter start codon or stop codon to find $startcodon="AUG"; $stopcodon="UAG "; My @codons= unpack("(A3)*",$seq); for( $a = 0; $a < $seqlen; $a++ ) { if ($codons[$a] eq $startcodon) { $startcod++; }

Department of BCA Complimentary Copy 12  BioPerl Programming Language

if ($codons[$a] eq $stopcodon) { $stopcod++; } } print "Total no. of start codon is: $startcod \n"; print "Total no. of stop codon is: $stopcod \n"; Output:

Department of BCA Complimentary Copy BioPerl Programming Language  13

5. REVERSE COMPLEMENT OF DNA Prerequisites: Perl Software, Notepad, SMS Tool. Aim: To write a program to calculate the reverse complement of DNA. Algorithm: Step 1: Start the process. Step 2: Open programs folder in Perl Folder. Step 3: Open Notepad and type the program. Step 4: Save the program as reverseseq.pl in the folder which is created in perl. Step 5: Run the program in Command Prompt as perl reverseseq.pl. Step 6: Stop the program. Program: #!/usr/bin/perl –w # Calculating the reverse complement of a strand of DNA # The DNA $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; # Print the DNA onto the screen print "Here is the starting DNA:\n\n"; print "$DNA\n\n"; # Make a new (reverse) copy of the DNA $revcom = reverse $DNA; print "Reverse copy of DNA:\n\n$revcom\n\n"; # Translate A->T, C->G, G->C, T->A, s/// won't work! $revcom =~ tr/ACGT/TGCA/; # Print the reverse complement DNA onto the screen print "Here is the reverse complement DNA:\n\n$revcom\n"; exit; Output: C:\Perl64\Programs>perl reverseseq.pl Here is the starting DNA:

Department of BCA Complimentary Copy 14  BioPerl Programming Language

ACGGGAGGACGGGAAAATTACTACGGCATTAGC Reverse copy of DNA: CGATTACGGCATCATTAAAAGGGCAGGAGGGCA Here is the reverse complement DNA: GCTAATGCCGTAGTAATTTTCCCGTCCTCCCGT C:\Perl64\Programs> Steps Using Tools: 1) Search for SMS in Google. 2) Choose Reverse Compliment option. 3) Enter the saved sequence & press submit to view the output.

Department of BCA Complimentary Copy BioPerl Programming Language  15

Department of BCA Complimentary Copy 16  BioPerl Programming Language

6. DNA FRAGMENTS TRANSCRIPTION Prerequisites: Perl Software, Notepad, Attotron Tool. Aim: To write a program to concatenate DNA fragments transcription. Algorithm: Step 1: Start the process. Step 2: Open programs folder in Perl Folder. Step 3: Open Notepad and type the program. Step 4: Save the program as transdna.pl in the folder which is created in perl. Step 5: Run the program in Command Prompt as perl transdna.pl. Step 6: Stop the program. Program: #!/usr/bin/perl -w # Transcribing DNA into RNA

# The DNA $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

# Print the DNA onto the screen print "Here is the starting DNA:\n\n"; print "$DNA\n\n";

# Transcribe the DNA to RNA by substituting all T's with U's. $RNA = $DNA;

$RNA =~ s/T/U/g;

# Print the RNA onto the screen print "Here is the result of transcribing the DNA to RNA:\n\n";

Department of BCA Complimentary Copy BioPerl Programming Language  17 print "$RNA\n";

# Exit the program. exit; Output: C:\Perl64\Programs>perl transdna.pl Here is the starting DNA: ACGGGAGGACGGGAAAATTACTACGGCATTAGC Here is the result of transcribing the DNA to RNA: ACGGGAGGACGGGAAAAUUACUACGGCAUUAGC C:\Perl64\Programs> Steps Using Tools: 1) Search Attotron in Google. 2) Place the Sequence in the query box & click Transcribe option to view the output

Department of BCA Complimentary Copy 18  BioPerl Programming Language

Department of BCA Complimentary Copy BioPerl Programming Language  19

Department of BCA Complimentary Copy 20  BioPerl Programming Language

Notes **********

Department of BCA Complimentary Copy BioPerl Programming Language  21

7. DNA MUTATION Prerequisites: Perl Software, Notepad. Aim: To write a program to perform DNA mutation. Algorithm: Step 1: Start the process. Step 2: Open programs folder in Perl Folder. Step 3: Open Notepad and type the program. Step 4: Save the program as in the folder which is created in perl. Step 5: Run the program in Command Prompt. Step 6: Enter the file containing DNA sequence to perform mutation during execution. Step 7: Stop the program. Program: # This script can be used to Mutate a DNA Sequence. # This script randomly mutates the DNA sequence and generates 10 successive mutation results. # While executing this script it asks for the file name of the DNA sequence. # If the DNA sequence file is not in the same directory of this script, enter the file name with its full path. # Example: # In windows: c:\rnafile.txt # In Linux: /home/user/sequence/rnafile.txt use File::Path; print "\n\t#################### MUTATION OF DNA ####################\n\n"; print "ENTER THE FILENAME OF THE DNA SEQUENCE:= "; $dnafilename = ; chomp $dnafilename; unless ( open(DNAFILE, $dnafilename) ) { print "Cannot open file \"$dnafilename\"\n\n";

Department of BCA Complimentary Copy 22  BioPerl Programming Language

goto h; } my $DNA = ; close DNAFILE; my $i; my $mutant; $mutant = mutate($DNA); print "Mutate DNA\n\n"; print "HERE IS THE ORIGINAL DNA SEQUENCE:\n"; print "$DNA\n\n"; print "HERE IS THE MUTANT DNA SEQUENCE:\n"; print "$mutant\n\n"; print "HERE ARE THE 10 SUCCESSIVE MUTATIONS:\n\n"; for ($i=0 ; $i < 10 ; ++$i) { $mutant = mutate($mutant); print "$mutant\n"; print WRITE "$mutant\n"; } sub mutate { my($dna) = @_; my($position) = randomposition($dna); my $current_base = substr($dna, $position, 1); my $newbase; do { $newbase = randomnucleotide(); } until ($newbase ne $current_base); substr($dna,$position,1,$newbase); return $dna; }

Department of BCA Complimentary Copy BioPerl Programming Language  23 sub randomposition { my($string) = @_; return int rand length $string; } sub randomelement { my(@array) = @_; return $array[rand @array]; } sub randomnucleotide { my(@nucleotides) = ('A', 'C', 'G', 'T'); return randomelement(@nucleotides); } Sample DNA Sequence - Save it as "dnafile.txt” AATTCATTTTTAATCCTTTAATAGTCCACAGTAATATTGTCCTAAAGAGGGTA CATTGG ATTTTAATTTTGCTTTCAATATGACGGCTGTCAATGTTGCCCTGATTCGTGAT ACCAAGTG GCTGACTTTAGAAGTCTGTAGAGAATTTCAGAGAGGAACTTGCTCTCGAGC TGATGCAGATT GCAAGTTTGCCCATCCACCAAGAGTTTGCCATGTGGAAAATGGTCGTGTGG TGGCCTGTTTTG ATTCTCTAAAGGGTCGGTGTACCCGAGAGAACTGCAAGTACCTTCACCCTC CTCCACACTTAA AAACGCAGCTGGAGATTAATGGGCGGAACAATCTGATTCAACAGAAGACTG CCGCAGCCATGTT CGCCCAGCAGATGCAGCTTATGCTCCAAAACGCTCAAATGTCATCACTTGG TTCTTTTCCTATG ACTCCATCAATTCCAGCTAATCCTCCCATGGCTTTCAATCCTTACATACCAC ATCCTGGGATGGG

Department of BCA Complimentary Copy 24  BioPerl Programming Language

CCTCGTTCCTGCAGAACTTGTACCAAATACACCTGTTCTGATTCCTGGAAAC CCACCTCTTGCAAT GCCAGGAGCTGTTGGCCCAAAACTGATGCGTTCAGATAAACTGGAGGTTTG CCGA Output: #################### MUTATION OF DNA #################### ENTER THE FILENAME OF THE DNA SEQUENCE:= Mutate DNA HERE IS THE ORIGINAL DNA SEQUENCE: AATTCATTTTTAATCCTTTAATAGTCCACAGTAATATTGTCCTAAAGAGGGTA CATTGGATTTTAATTTTGCTTTCAATATGACGGCTGTCAATGTTGCCCTGATT CGTGATACCAAGTG HERE IS THE MUTANT DNA SEQUENCE: AATTCATTCTTAATCCTTTAATAGTCCACAGTAATATTGTCCTAAAGAGGGTA CATTGGATTTTAATTTTGCTTTCAATATGACGGCTGTCAATGTTGCCCTGATT CGTGATACCAAGTG HERE ARE THE 10 SUCCESSIVE MUTATIONS: AATTCATTCTTAATCCTTTAATAGTCCACAGTAATATTGTCCTAAAGAGGGTA CATTGGATTTTAATTTTGCTTTCAATATGACGGCTGTCAATGTTGCCCTGATT CGGGATACCAAGTG AATTCATTCTTAATCCTTTAATAGTCCACAGTAATATTGTCCTAAAGAGGGTA CATTGGATTTTAATTTTGCTTTCAATATGACGGCTGTCAATGTTGCCCTGATT CGAGATACCAAGTG AATTCTTTCTTAATCCTTTAATAGTCCACAGTAATATTGTCCTAAAGAGGGTA CATTGGATTTTAATTTTGCTTTCAATATGACGGCTGTCAATGTTGCCCTGATT CGAGATACCAAGTG AATTCTTTCTTAATCCTTTAATAGTCCACAGTAATATTGTCCTGAAGAGGGTA CATTGGATTTTAATTTTGCTTTCAATATGACGGCTGTCAATGTTGCCCTGATT CGAGATACCAAGTG AATTCTTTCTTAATCCTTTAATAGTCCACAGTAATATTGTCCTGAAGAGGGTA CATTTGATTTTAATTTTGCTTTCAATATGACGGCTGTCAATGTTGCCCTGATT CGAGATACCAAGTG

Department of BCA Complimentary Copy BioPerl Programming Language  25

AATTCTTTCTTAATCCTTTAATAGTCCACAGTAATATTGTCCTGAAGAGGGTA CATTTGATTTTAATTTTGCTTTCAATATGACGGCTGTCAATGTTGCCCTGATT CGAGATACAAAGTG AATTCTTTCTTAATCCTTTAATAGTCCACAGTAATATTGTCCTGAAGAGGGTA CATTTGATTTTAATTTTGCTTTCAATATGACGGCTGTCAATGTTGCCCAGATT CGAGATACAAAGTG AATTCTTTCTTAATCCTTTAATAGTACACAGTAATATTGTCCTGAAGAGGGTA CATTTGATTTTAATTTTGCTTTCAATATGACGGCTGTCAATGTTGCCCAGATT CGAGATACAAAGTG AATTCTTTCTTAATCCTTTAATAGTACCCAGTAATATTGTCCTGAAGAGGGTA CATTTGATTTTAATTTTGCTTTCAATATGACGGCTGTCAATGTTGCCCAGATT CGAGATACAAAGTG AATTCTTTCTTAATCCTTTAATAGTACCCAGTAATAATGTCCTGAAGAGGGTA CATTTGATTTTAATTTTGCTTTCAATATGACGGCTGTCAATGTTGCCCAGATT CGAGATACAAAGTG

Department of BCA Complimentary Copy 26  BioPerl Programming Language

Notes **********

Department of BCA Complimentary Copy BioPerl Programming Language  27

8. TO CHECK FILE EXTENSIONS Prerequisites: Perl Software, Notepad. Aim: To write a Bioperl program code to check the file extension. Algorithm: Step 1: Start the process Step 2: Start -> all programs -> notepad Step 3: Enter two files- file1, file2 with their extension Step 4: Using ext and split command split the two file, file1 and file2 Step 5: Print the file1 as fasta file and file2 as text file. Step 6: Run it on command prompt and note the output Step 7: Stop the process Program: #!usr/bin/perl $file1="ref_human.fasta"; $file2="seq_mouse.txt"; @ext1=split /\./, $file1; @ext2=split /\./,$file2; if (@ext1[1] eq "fasta") { print "File1 format is fasta\n"; } else { print "File1 format is text\n"; } if (@ext2[1] eq "fasta") { print "File2 format is fasta\n"; } else { print "File2 format is text\n"; }

Department of BCA Complimentary Copy 28  BioPerl Programming Language

Output:

Department of BCA Complimentary Copy BioPerl Programming Language  29

9. TO SEARCH STRING IN SEQUENCE Prerequisites: Perl Software, Notepad Aim: To write a program to search given word (string) in an entered sequence. Algorithm: Step 1: Start the process Step 2: Start -> all programs -> notepad Step 3: Enter and store the input sequence Step 4: Enter the input string to find the input sequence Step 5: Use for loop to find the string count Step 6: Using the print statement find the occurrences of the string Step 7: Save the program and open the command prompt Step 8: Give the command Cd and paste the address to change the directory Step 9: Enter the statement perl filename .pl to run the file. Step 10: Stop the process Program: #!/usr/bin/perl print "Enter #Asking user to enter input sequence sequence:\n"; $str=; #Variable to store input sequence chomp($str); #Removing new line by chomping input seq $strl=length($str); #finding actual seq length print "Enter string to find in sequence:\n"; #Asking user to enter string to find in input seq. $find=; #Variable to store string to find chomp($find); ##Removing new line by chomping string to s $findlen=length($find); #finding length of string to find $fc=0; #initialized variable to store occurance of string found @codons= unpack("(A$findlen)*",$str); #Making chunks of input seq depending on length of string for ($i=0; $i<$strl; $i++) #for loop to find string

Department of BCA Complimentary Copy 30  BioPerl Programming Language

{ if ($codons[$i] eq $find) { $fc++; #incrementing string found count based on match found. } } print "$find found $fc time.\n"; #Printing occurance of string found Output:

Department of BCA Complimentary Copy BioPerl Programming Language  31

10. TOTAL AND AVERAGE LENGTH OF PROTEINS IN FILES Prerequisites: Notepad, Expasy Tool Aim: To write a program to compute total and average length of the proteins in the files. Steps Using Tools: 1. Search Expasy tool on the Google. 2. Choose Proteome in the column. 3. Choose function analysis under that. 4. Below the database Uniprot KB click on the link. 5. Choose Proteome and choose any Id. 6. Copy the protein sequence in the text file. 7. Choose protpram in Expasy. 8. Paste the sequence to view the output.

Department of BCA Complimentary Copy 32  BioPerl Programming Language

Department of BCA Complimentary Copy BioPerl Programming Language  33

Department of BCA Complimentary Copy 34  BioPerl Programming Language

Department of BCA Complimentary Copy BioPerl Programming Language  35

Department of BCA Complimentary Copy 36  BioPerl Programming Language

Department of BCA Complimentary Copy BioPerl Programming Language  37

11. READING PROTEIN SEQUENCE FROM FILES Prerequisites: Perl Software, Notepad, Expasy Tool, UniProtKB. Aim: To write a program to read protein sequence from protein files. Algorithm: Step 1: Start the process. Step 2: Open the file containing the sequence data. Step 3: Read the protein sequence from the file and store it in an array variable. Step 4: Print the protein into the screen. Step 5: Close the file and stop the process. Program: #!/usr/bin/perl -w # Reading protein sequence data from a file, take 3

# The filename of the file containing the protein sequence data $proteinfilename = 'NM_021964fragment.pep';

# First we have to "open" the file open(PROTEINFILE, $proteinfilename);

# Read the protein sequence data from the file, and store it # into the array variable @protein @protein = ;

# Print the protein onto the screen print @protein;

# Close the file. close PROTEINFILE; exit;

Department of BCA Complimentary Copy 38  BioPerl Programming Language

Output: MNIDDKLEGLFLKCGGIDEMQSSRTMVVMGGVSGQSTVSGELQDSVLQDR SMPHQEILAADEVLQESEMRQQDMISHDELMVHEETVKNDEEQMETHERL PQGLQYALNVPISVKQEITFTDVSEQLMRDKKQIR Steps Using Tools: 1. Search for Expasy tool in the Google 2. Click the Proteomics column. 3. Below the databases, click on UniProtKB link. 4. Copy the protein sequence in the text file.

Department of BCA Complimentary Copy BioPerl Programming Language  39

Department of BCA Complimentary Copy 40  BioPerl Programming Language

Department of BCA Complimentary Copy BioPerl Programming Language  41

Department of BCA Complimentary Copy 42  BioPerl Programming Language

Notes **********

Department of BCA Complimentary Copy BioPerl Programming Language  43

12. SEQUENCE INTO A FILE Prerequisites: Perl Software, Notepad. Aim: To write a program to write a sequence into a file. Algorithm: Step 1: Start the process Step 2: Start -> all programs -> notepad Step 3: Ask the user to enter the filename with extension Step 4: Ask the user to enter the D.O.B and name Step 5: Get the input using STDIN command Step 6: Run the program in command prompt and see to the text file being copied in the folder Step 7: Stop the process Program: #!/usr/bin/perl print "Enter your file name with extension:\n"; $filename = ; chomp($filename); print "What is your name?\n"; $name=; chomp($name); print "\nWhat is DOB?\n"; $dob=; chomp($dob); open(my $fh, '>', $filename) or die "Could not open file '$filename' $!"; print $fh "Name: $name\nDOB: $dob\n"; close $fh; print "done\n";

Department of BCA Complimentary Copy 44  BioPerl Programming Language

Output:

Department of BCA Complimentary Copy BioPerl Programming Language  45

Notes **********

Department of BCA Complimentary Copy 46  BioPerl Programming Language

Notes **********

Department of BCA Complimentary Copy