Daily Quiz 6”

Daily Quiz 6”

● Log into the Moodle site ● Enter the “Lecture 8” area (button 9) ● At 14:00, choose “Daily Quiz 6” ● Answer the multiple choice quiz (you have until 10 min to finish) More text file manipulation: sorting, cutting, pasting, joining, subsetting, … J. M. P. Alves Laboratory of Genomics & Bioinformatics in Parasitology Department of Parasitology, ICB, USP Editing ● Everything very nice so far, we can view and even change the output copy of the text a little bit sometimes, but… how about real editing? ● There are lots of text editors for Linux ● There is even rivalry and much teasing between users of one editor (vi) and another (emacs) – I myself much prefer emacs ● These are very powerful editors, heavily tailored for programmers but used by many kinds of users J.M.P. Alves 3 / 43 BMP0260 / ICB5765 / IBI5765 Editing ● For that reason, and being command-line based (no mouse or menus to help you), these programs are very complex and hard to learn ● Fret not, for there is a much simpler, and most of the time good enough, alternative: nano nano file ● nano is a small, free and friendly editor which (from the man page) aims to replace Pico, the default editor included in the non-free Pine package ● (Pine was a command-line email client, and included a text editor called Pico; since they were not free-software, they could not be included in GNU and other such free systems, so nano was written) J.M.P. Alves 4 / 43 BMP0260 / ICB5765 / IBI5765 nano ● As the SI prefixes imply, nano is much more than pico, with many more functions and capabilities ● Although it lives in the CLI, nano is sort of window-based (still little to no use for the mouse though) ● It displays a helpful bar at the bottom of the window with the most-used commands ● nano commands are usually a combination of Ctrl and some other key J.M.P. Alves 5 / 43 BMP0260 / ICB5765 / IBI5765 nano ~jmalves/bin/average J.M.P. Alves 6 / 43 BMP0260 / ICB5765 / IBI5765 nano -Y perl ~jmalves/bin/average J.M.P. Alves 7 / 43 BMP0260 / ICB5765 / IBI5765 nano ● nano is to be used like any simple text-only editor, such as Notepad (in Windows) or Gedit (in Linux) ● Therefore, there are no text decorations (such as italics, bold, different fonts etc.) available ● The main nano commands are: Ctrl+x: exit Ctrl+o: save file Ctrl+k: cut one or more whole lines Ctrl+u: paste lines cut by Ctrl+k Ctrl+w: search Ctrl+g: display the built-in help J.M.P. Alves 8 / 43 BMP0260 / ICB5765 / IBI5765 Exploring files ● We have already seen a useful command to explore the contents of a text file: wc (for word count) ● wc will tell you some basic statistics about the text file ● Those are: ● Number or lines ● Number or bytes (or alternatively characters) ● Number of words ● A word is (from the man page) a non-zero-length sequence of characters delimited by white space; so fdh236 is considered a word ● By default, wc give us lines, words, and bytes; we can change that, of course: wc -l some_text_file J.M.P. Alves 9 / 43 BMP0260 / ICB5765 / IBI5765 A unique challenge ● Often you want to see only the unique lines of a file, omitting repeated lines ● In other cases, you want the opposite: see only what occurs more than once ● The command uniq is a simple utility designed to do exactly that ● uniq takes a sorted text file (we will learn sorting in the next lecture) ● Only one instance of the repeated lines will be shown ● uniq is also a filter, like most Unix commands: takes STDIN, sends to STDOUT ● But it can also be given file names directly (input then output). Examples: uniq some_file filtered_file head sorted_file | uniq -d J.M.P. Alves 10 / 43 BMP0260 / ICB5765 / IBI5765 Quiz time! Go to the course page and choose Quiz 21 J.M.P. Alves 11 / 43 BMP0260 / ICB5765 / IBI5765 Now you do it! Go to the course site and enter Practical Exercise 22 Follow the instructions to answer the questions in the exercise Remember: in a PE, you should do things in practice before answering the question! J.M.P. Alves 12 / 43 BMP0260 / ICB5765 / IBI5765 Can't make heads or tails of any of this? ● The next two commands are also great ways to quickly explore text file content ● To see just the beginning of a file, use the head command ● By default, head shows us the first ten lines of the data ● head can also show us the first X bytes (or kilobytes, or megabytes etc.) of a file ● ...or all lines (or bytes) except for a certain amount of the final ones ● head is a filter, like most Unix commands: takes STDIN, sends to STDOUT ● Examples: head some_file ls -lS /usr/bin/ | head J.M.P. Alves 13 / 43 BMP0260 / ICB5765 / IBI5765 Can't make heads or tails of any of this? ● If head shows us the first X lines or bytes, guess what shows us the last lines or bytes... ● That would be the tail command, of course; it is very similar to head, including most options ● By default, tail shows us the last ten lines of the data ● tail can also show us the last X bytes (or kilobytes, or megabytes etc.) of a file ● ...or all lines (or bytes) except for a certain amount of the first ones ● tail is a filter, like most Unix commands: takes STDIN, sends to STDOUT ● Examples: tail some_file ls -lS /usr/bin/ | tail J.M.P. Alves 14 / 43 BMP0260 / ICB5765 / IBI5765 Getting remote data ● We have already used a command-line tool, called wget, to download data from the Web to our local computer ● This program does non-interactive download of files, and it supports HTTP, HTTPS, and FTP protocols ● At its basic, you just give wget the exact address of the file you want: wget http://www.google.com ● wget can also do recursive downloading: you tell it where to start and how deep it should go, and it downloads all files it finds ● It is also possible to specify certain patterns, like prefixes or file extensions, to accept (ignoring all else) or reject (getting all else) J.M.P. Alves 15 / 43 BMP0260 / ICB5765 / IBI5765 Getting remote data ● In other cases, we want to transfer data from (or to) another computer that is not a Web server accessible through the browser or an FTP client ● In that case, scp (for secure copy) is the main tool for the job – as long as the OpenSSH packages are installed! ● This program behaves a lot like cp, and it can even perform copies within the same computer ● Differently from wget, scp is bidirectional: you can either get data from a remote computer or send data to a remote computer ● Another difference is that with scp you need a user account on the remote computer ● Like cp, scp does not copy directories by default J.M.P. Alves 16 / 43 BMP0260 / ICB5765 / IBI5765 Getting remote data ● Giving its added complexity, scp uses a slightly different command structure: scp user1@remote_computer:path_to_file local_path ● This command will get a file (path_to_file) from a remote computer (remote_computer), using user1 as the user name – scp will ask for this user’s password – and saving the file in the path and/or file name local_path ● To do the opposite and send a file to a remote computer, just invert the order or the arguments: scp local_path user1@remote_computer:path_to_file ● If you just want to send local_path to your $HOME, the path_to_file part is not needed (just leave the command empty after the : character if you just want to send the file to your home directory) ● Example: scp [email protected]:hello.c . J.M.P. Alves 17 / 43 BMP0260 / ICB5765 / IBI5765 Quiz time! Go to the course page and choose Quiz 23 J.M.P. Alves 18 / 43 BMP0260 / ICB5765 / IBI5765 Inverse cat ● Last lecture we learned about viewing text with the cat (for concatenate) command ● cat sends the contents of one or more text files (or STDIN) to STDOUT ● But what if you want to get the contents from each file starting from the end instead of the beginning? ● Then, you invert cat and use the tac command J.M.P. Alves 19 / 43 BMP0260 / ICB5765 / IBI5765 Inverse cat cat file Program version: 1.0.12 Date: Thu May 4 Seed: ./18s Seed type: DNA Database: db/dma97 Database type: fastq Duplicate headers name: yes tac file Expansion direction: both Assembler used: abyss-pe Number of threads: 4 Number of threads: 4 Assembler used: abyss-pe Expansion direction: both Duplicate headers name: yes Database type: fastq Database: db/dma97 Seed type: DNA Seed: ./18s Date: Thu May 4 Program version: 1.0.12 J.M.P. Alves 20 / 43 BMP0260 / ICB5765 / IBI5765 Getting columns ● Tabular files frequently contain much more data than what we are actually interested in ● It is thus sometimes useful to be able to get at just one or a few columns of the data ● The cut command allows us to do exactly that ● cut uses the TAB character as the default column delimiter ● Of course, there is an option to change that ● Another option tells the program which column(s) to retrieve J.M.P. Alves 21 / 43 BMP0260 / ICB5765 / IBI5765 Getting columns ● Let’s try! ● In the remote server, run: cut -f 1 /data/column_example ● Notice that only the first column (tab-delimited!) of the file was sent to STDOUT ● Now try: cut -f 1 /data/column_example2 cut -f 1 -d , /data/column_example2 J.M.P.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    43 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us