LINUX Command Line Reference for Bioinformatics
Total Page:16
File Type:pdf, Size:1020Kb
LINUX Command Line DIRECTORIES FILES Reference for Bioinformatics UNCOMPRESS TREE TO DIR More information on dealing with large text files is The commands sequence of commands below allow listed under the FASTA FILES heading. DIRECTORIES you to uncompress an entire directory tree to a single directory. This is useful if you have COPY, RENAME & MOVE FILES downloaded sequence trace files from genbank and CREATE/DELETE DIRECTORIES cp MySeqs.fasta MyCopy.fasta you would like all of the data in a single directory. Copy the file to MyCopy.fasta mkdir SeqDir Make the directory SeqDir cp *.fasta SeqDir/ tar -C output -zxf archive.tar.gz Copy all files with the fasta extension to the rmdir SeqDir cd output destination folder. Remove the empty directory SeqDir. find . -type f -exec mv -i {} . \; mv MySeqs.fasta New.fasta rm -rf SeqDir/ find * -type d -prune -exec rm -rf {} \; Rename the MySeqs.fasta file to New.fasta If the directory is not empty, you can can delete the dir and all subdirs and files using mv *.fasta SeqDir/ the rm command with the r and f options. Move all files with the fasta extension to the SeqDir directory. NAVIGATION FILE PERMISSIONS cd SeqDir Change your current working directory to chmod ### MyFile.txt ie: the SeqDir. Using cd without any options chmod 755 MyProgram.pl 'cd' will take you to your home directory. Change the permissions associated with files and directories. (ie make a PERL cd .. Change to the parent directory. programs executable). The ### refer to file cd /home/username/Dir/SubDir permission numeric code for Change dir using the full directory path. FILE COMPRESSION DIR INFORMATION gzip MyFile.txt List the full path of your current directory. pwd This will gzip the file MyFile.txt List the files in the current directory. ls gunzip MyFile.txt.gz ls -alh This will unzip the file MyFile.txt List all files and show file size in a human bzip2 MyFile.txt readable format. Compress the file with bzip (better). ls -l | wc -l bunzip2 MyFile.txt.bz Count the number of files in a directory. Unzip the bzipped file. DIR COMPRESSION FIND FILES tar -cvf SeqDir.tar SeqDir/ or locate MyFile.txt tar -cvfz SeqDir.tar.gz SeqDir/ The locate command can be used to find Use the tar (Tape Archive) program to the location of files on your hard drive. archive the directory named SeqDir. Use of the z option will zip the archive. Use tar -xvf SeqDir.tar Use x to extract the tar archive. GENERAL PROGRAMS RESOURCE USAGE FASTA FILES man progname It is important to keep track of your resource usage For a FASTA file named MySeqs.fasta: The man command displays the manual for in multiuser environments. The following Linux command line programs. commands help you keep track of your storage, and FILE OVERVIEW nohup processor use on your Linux machine. ls *.fasta The nohup command allows you to close Show all fasta files in the directory. your terminal connection to the Linux DISK USAGE grep -c '>' MySeqs.fasta machine but keep your program running. quota Count the number of sequence records. clear See what your disk usage quota is on the wc -l MySeqs.fasta Clear the screen. current machine. You may have no quota. Count the number of lines in the file. passwd df -h wc -c MySeq.fasta Set your user password. Look at the amount of disk space used by Count the total number of characters. you and everyone else on the server. du -h --max-depth=1 VIEW FILE SPECIAL CHARACTERS Display your disk use in the current dir. less MySeqs.fasta This is a good way to check which files or View the entire fasta file. directories are using up disk space. head -n 50 MySeqs.fasta or | Pipe Output head -n 300 MySeqs.fasta | less You can pass output from one program to PROCESSES Look at the beginning of a fasta file. Use another program using the pipe character |. top (-n) to select the number of lines. For large Examples are: Display the top CPU processes on the local n pipe the output to the less utility. ls | less or head myfile | less machine. This will show the processes as tail -n 50 MySeqs.fasta or > Write to File well as their memory and processor usage. The > character can be used to send tail -n 300 MySeqs.fasta | less ps -ef Look at the end of the fasta file. program output to a text file. Examples: Show all processes currently running ls > File.txt or ps -ef | grep username locate perl > Perl.txt MERGE AND SPLIT FILES Show only your processes. If a process is cat *.fasta > AllSeqs.fasta >> Append to File running that you want to stop, use the 'kill' Results are appended to the outfile. Combine all fasta files in the current command. directory into a single file. * Wildcard Character kill PID csplit AllSeqs.fasta '/>/' {*} or The asterix is often used as the wildcard Kill the process identified by the process id character. It will match a set of characters csplit -f Seq -n 8 AllSeqs.fasta '/>/' {*} (PID). The PID can be determined using Split the fasta file into a separate fasta file for for any length. Example use: ls *.fasta the ps utility. WARNING: 'kill -9' is the each record. The following options are available Wildcard Character ? nuclear option and will kill the hell out of csplit: The question mark is a wildcard for a single your runaway process; it may however -f Prefix in output names character. trash your database, files etc. -n Num digits long for output names -b Suffix for output names USERS who Show who else is logged on. finger username Get information about the user including real name, home dir etc. BASIC PERL EMACS TEXT EDITOR NCBI BLAST For a PERL program named MyPerl.pl: Emacs is a powerful text editor available on many The NCBI Standalone BLAST program is available linux distributions. Emacs makes heavy use of the for download from NCBI: MODIFY AND RUN PROGRAMS Clt, Meta (or ALT) and Shift keys. These are http://www.ncbi.nih.gov/BLAST/download.shtml. emacs MyPerl.pl indicated below as C, M and S. To launch emacs Use the emacs text editor to edit the from the command line simply type: formatdb -p F -i MySeqs.fasta -t Seq -n Seq program. emacs MyProgram.pl Format the fasta file named MySeqs.fasta. chmod 755 MyPerl.pl This will open the file MyProgram.pl for For more variables available type 'formatdb Make the program executable by you and editing in emacs. If the file does not already –help'. The title {-t} and name {-n} of the other people in your group and anyone else exist, a new file will be created. database will both be set to Seq. on the server but other people do not have C-h Online help wrtie access to the program. C-g Stop current operation. blastall --help ./MyPerl.pl Display the NCBI BLAST help. Run the perl program 'MyPerl.pl' in the FILES current directory. C-x C-s Save the current file. blastall -p program -i infile -d DB -o outfile C-x C-W Save the file to a new name. • program is one of: LOOPS C-x C-c Close the current file. Query Database for ( $i=0; $i<=$MaxNum; $i++) {} C-x d Open the directory. blastn Nucleotide Nucleotide Loop variable $i from zero to MaxNum C-x i Insert another file. blastp Protein Protein blastx Trans. Nucl. Protein FREQUENTLY USED PERL MODULES EDIT tblastn Protein Trans. Nucl. DBI tblastx Trans. Nucl. Trans. Nucl. Backspace Delete previous character. Database interface for connection to Kill to end of the current line. • infile is a fasta formatted text file database servers (MySQL). C-k Paste. is a blast database created using formadb Getopt::Std C-y • DB Accept command line arguments C-S-_ Undo. • outfile is the path of the output file. Cut. Term::ANSIColor C-w I like to give the outfile the *.blo extension to Print in color. Useful for drawing attention SEARCH represent this as a blast output. Go to end of the buffer. to error messages, table headers etc. M-S-> • A number of other command line options are example: C-s Search forward available for blastall. These include: print color 'bold red'; C-r Search backward -a Number of processors to use print “WARNING\n”; -e E-value cutoff print color 'reset'; CURSOR MOVEMENT -U Mask out lowercase letters Recenter, refresh screen (lc L) Text::Wrap C-l -G Cost to open a gap Move to beginning of current line I use this for printing strings of sequence C-a -E Cost to extend a gap Move to end of the current line residues that are tabbed over. C-e -W Default word size M-f Move forward one work M-b Move backward one word C-v Move forward one screen M-v Move back one screen M-S-< Go the beginning of the buffer M-x goto-line Goto line number SFTP SFTP is a secure file transfer program that comes installed by default with most Linux distributions. This is the most secure way to transfer files from the command line. CONNECTING sftp ftp.here.edu Connect to the ftp server at the address specified. You will be prompted for a valid user name and password. exit Quit the SFTP session. Also: quit help Display SFTP help. Also: ? ! Escape to the local shell ! cmd Run command 'cmd' in the local shell DIRECTORY NAVIGATION mkdir MyDir Create a directory on the ftp server lmkdir MyDir Create a directory on the local machine pwd Display the remote working dir.