
● Log into the Moodle site ● Enter the “Lecture 8” area (button 9) ● At 14:00, choose “Daily Quiz 6” ● Answer the multiple choice quiz (you have until 10 min to finish) More text file manipulation: sorting, cutting, pasting, joining, subsetting, … J. M. P. Alves Laboratory of Genomics & Bioinformatics in Parasitology Department of Parasitology, ICB, USP Editing ● Everything very nice so far, we can view and even change the output copy of the text a little bit sometimes, but… how about real editing? ● There are lots of text editors for Linux ● There is even rivalry and much teasing between users of one editor (vi) and another (emacs) – I myself much prefer emacs ● These are very powerful editors, heavily tailored for programmers but used by many kinds of users J.M.P. Alves 3 / 43 BMP0260 / ICB5765 / IBI5765 Editing ● For that reason, and being command-line based (no mouse or menus to help you), these programs are very complex and hard to learn ● Fret not, for there is a much simpler, and most of the time good enough, alternative: nano nano file ● nano is a small, free and friendly editor which (from the man page) aims to replace Pico, the default editor included in the non-free Pine package ● (Pine was a command-line email client, and included a text editor called Pico; since they were not free-software, they could not be included in GNU and other such free systems, so nano was written) J.M.P. Alves 4 / 43 BMP0260 / ICB5765 / IBI5765 nano ● As the SI prefixes imply, nano is much more than pico, with many more functions and capabilities ● Although it lives in the CLI, nano is sort of window-based (still little to no use for the mouse though) ● It displays a helpful bar at the bottom of the window with the most-used commands ● nano commands are usually a combination of Ctrl and some other key J.M.P. Alves 5 / 43 BMP0260 / ICB5765 / IBI5765 nano ~jmalves/bin/average J.M.P. Alves 6 / 43 BMP0260 / ICB5765 / IBI5765 nano -Y perl ~jmalves/bin/average J.M.P. Alves 7 / 43 BMP0260 / ICB5765 / IBI5765 nano ● nano is to be used like any simple text-only editor, such as Notepad (in Windows) or Gedit (in Linux) ● Therefore, there are no text decorations (such as italics, bold, different fonts etc.) available ● The main nano commands are: Ctrl+x: exit Ctrl+o: save file Ctrl+k: cut one or more whole lines Ctrl+u: paste lines cut by Ctrl+k Ctrl+w: search Ctrl+g: display the built-in help J.M.P. Alves 8 / 43 BMP0260 / ICB5765 / IBI5765 Exploring files ● We have already seen a useful command to explore the contents of a text file: wc (for word count) ● wc will tell you some basic statistics about the text file ● Those are: ● Number or lines ● Number or bytes (or alternatively characters) ● Number of words ● A word is (from the man page) a non-zero-length sequence of characters delimited by white space; so fdh236 is considered a word ● By default, wc give us lines, words, and bytes; we can change that, of course: wc -l some_text_file J.M.P. Alves 9 / 43 BMP0260 / ICB5765 / IBI5765 A unique challenge ● Often you want to see only the unique lines of a file, omitting repeated lines ● In other cases, you want the opposite: see only what occurs more than once ● The command uniq is a simple utility designed to do exactly that ● uniq takes a sorted text file (we will learn sorting in the next lecture) ● Only one instance of the repeated lines will be shown ● uniq is also a filter, like most Unix commands: takes STDIN, sends to STDOUT ● But it can also be given file names directly (input then output). Examples: uniq some_file filtered_file head sorted_file | uniq -d J.M.P. Alves 10 / 43 BMP0260 / ICB5765 / IBI5765 Quiz time! Go to the course page and choose Quiz 21 J.M.P. Alves 11 / 43 BMP0260 / ICB5765 / IBI5765 Now you do it! Go to the course site and enter Practical Exercise 22 Follow the instructions to answer the questions in the exercise Remember: in a PE, you should do things in practice before answering the question! J.M.P. Alves 12 / 43 BMP0260 / ICB5765 / IBI5765 Can't make heads or tails of any of this? ● The next two commands are also great ways to quickly explore text file content ● To see just the beginning of a file, use the head command ● By default, head shows us the first ten lines of the data ● head can also show us the first X bytes (or kilobytes, or megabytes etc.) of a file ● ...or all lines (or bytes) except for a certain amount of the final ones ● head is a filter, like most Unix commands: takes STDIN, sends to STDOUT ● Examples: head some_file ls -lS /usr/bin/ | head J.M.P. Alves 13 / 43 BMP0260 / ICB5765 / IBI5765 Can't make heads or tails of any of this? ● If head shows us the first X lines or bytes, guess what shows us the last lines or bytes... ● That would be the tail command, of course; it is very similar to head, including most options ● By default, tail shows us the last ten lines of the data ● tail can also show us the last X bytes (or kilobytes, or megabytes etc.) of a file ● ...or all lines (or bytes) except for a certain amount of the first ones ● tail is a filter, like most Unix commands: takes STDIN, sends to STDOUT ● Examples: tail some_file ls -lS /usr/bin/ | tail J.M.P. Alves 14 / 43 BMP0260 / ICB5765 / IBI5765 Getting remote data ● We have already used a command-line tool, called wget, to download data from the Web to our local computer ● This program does non-interactive download of files, and it supports HTTP, HTTPS, and FTP protocols ● At its basic, you just give wget the exact address of the file you want: wget http://www.google.com ● wget can also do recursive downloading: you tell it where to start and how deep it should go, and it downloads all files it finds ● It is also possible to specify certain patterns, like prefixes or file extensions, to accept (ignoring all else) or reject (getting all else) J.M.P. Alves 15 / 43 BMP0260 / ICB5765 / IBI5765 Getting remote data ● In other cases, we want to transfer data from (or to) another computer that is not a Web server accessible through the browser or an FTP client ● In that case, scp (for secure copy) is the main tool for the job – as long as the OpenSSH packages are installed! ● This program behaves a lot like cp, and it can even perform copies within the same computer ● Differently from wget, scp is bidirectional: you can either get data from a remote computer or send data to a remote computer ● Another difference is that with scp you need a user account on the remote computer ● Like cp, scp does not copy directories by default J.M.P. Alves 16 / 43 BMP0260 / ICB5765 / IBI5765 Getting remote data ● Giving its added complexity, scp uses a slightly different command structure: scp user1@remote_computer:path_to_file local_path ● This command will get a file (path_to_file) from a remote computer (remote_computer), using user1 as the user name – scp will ask for this user’s password – and saving the file in the path and/or file name local_path ● To do the opposite and send a file to a remote computer, just invert the order or the arguments: scp local_path user1@remote_computer:path_to_file ● If you just want to send local_path to your $HOME, the path_to_file part is not needed (just leave the command empty after the : character if you just want to send the file to your home directory) ● Example: scp [email protected]:hello.c . J.M.P. Alves 17 / 43 BMP0260 / ICB5765 / IBI5765 Quiz time! Go to the course page and choose Quiz 23 J.M.P. Alves 18 / 43 BMP0260 / ICB5765 / IBI5765 Inverse cat ● Last lecture we learned about viewing text with the cat (for concatenate) command ● cat sends the contents of one or more text files (or STDIN) to STDOUT ● But what if you want to get the contents from each file starting from the end instead of the beginning? ● Then, you invert cat and use the tac command J.M.P. Alves 19 / 43 BMP0260 / ICB5765 / IBI5765 Inverse cat cat file Program version: 1.0.12 Date: Thu May 4 Seed: ./18s Seed type: DNA Database: db/dma97 Database type: fastq Duplicate headers name: yes tac file Expansion direction: both Assembler used: abyss-pe Number of threads: 4 Number of threads: 4 Assembler used: abyss-pe Expansion direction: both Duplicate headers name: yes Database type: fastq Database: db/dma97 Seed type: DNA Seed: ./18s Date: Thu May 4 Program version: 1.0.12 J.M.P. Alves 20 / 43 BMP0260 / ICB5765 / IBI5765 Getting columns ● Tabular files frequently contain much more data than what we are actually interested in ● It is thus sometimes useful to be able to get at just one or a few columns of the data ● The cut command allows us to do exactly that ● cut uses the TAB character as the default column delimiter ● Of course, there is an option to change that ● Another option tells the program which column(s) to retrieve J.M.P. Alves 21 / 43 BMP0260 / ICB5765 / IBI5765 Getting columns ● Let’s try! ● In the remote server, run: cut -f 1 /data/column_example ● Notice that only the first column (tab-delimited!) of the file was sent to STDOUT ● Now try: cut -f 1 /data/column_example2 cut -f 1 -d , /data/column_example2 J.M.P.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages43 Page
-
File Size-