<<

Working with files

CISC3130, Spring 2013 X. Zhang

1 Outlines  Finish up with : , external commands  Commands working with files  , (-d option, -1 option, -R, -a)  (octal ), stat (show meta data of )  command, temporary file, file with random bytes  File , verification  , , , command: Finding files

2 Some useful tips  stores the commands  Use UP/DOWN arrow to browse them  Use “history” to show past commands  Repeat a previous command  !  e.g., !239  “!  E.g., !g++  Search for a command  Type Ctrl-r, and then a string  Bash will search previous commands for a match  File name autocompletion: “tab” key 3

Output : to pipeline #!/bin/awk -f END{ BEGIN { while ((getline < tmpfile) > 0) FS = ":“ { ## generate a temporay file cmd=" -s Fellow_BASH_USER " $0 " /tmp/prog.XXXXXXXX" | print "Hello," $0 | cmd getline tmpfile ## send an email to every bash user print "temp file is: ", tmpfile } ("mktemp") close (tmpfile); } }

{ # select username for users using bash pipe_mail.awk Todo: if ($7 ~ "/bin/bash") 1. print $1 >> tmpfile 2. 4 }

Execute external command  Using system function (similar to /C++)  E.g., system (“ –f tmp”) to remove a file if (system(“rm –f tmp”)!=0) print “failed to rm tmp”  A is started to run the command line passed as argument  Inherit awk program’s standard input/output/error

5 Outlines  Finish up with awk: pipeline, external commands  Commands working with files  tree, ls (-d option, -1 option, -R, -a)  od (octal dump), stat (show meta data of file), ,  touch command  temporary file, file with random bytes  locate, type, which, find command: Finding files

6 What’s in a file ?  files are organized in a hierarchical structure  Each file has a name, resides under a directory, is associated with some meta (permission, owner, timestamps)  Disk files, virtual , device files  Contents of disk file: text (ASCII) file (such as your C/C++ source code), executable file (commands), a to other files, …  -s //to/file1.txt /path/to/file2.txt  /proc filesystem stores system configuration parameters, resides in kernels memory  Numerical subdirectories exist for every .  a or special file is an interface for a device driver that appears in a file system as if it were an ordinary file  7 For example, /dev/stdin, /dev/* What’s in a file ?  Recall, ls –l output, first character indicates file types:  d directory, - plain file, b block-type special file, c character-type special file, l , s socket  To check type of file: “file ”  To view “octal dump” of a file:  od [OPTION]... [FILE]... od --traditional [FILE] [[+]OFFSET [[+]LABEL]]  Important options:  -A: what base to use when displaying address (default: base 8)  -t: specify how to interpret file content  a: named character, c: ASCII character or backslash representation  d[size]: signed decimal, size bytes per integer 8  o[size], octal ; x[size], hexadecimal What’s in a file ?  Example of od $ abc def ghi jkl | od -c 0000000 a b c d e f g h i j k l \n 0000020 [zhang@storm ~]$ echo abc def ghi jkl | od -Ad –c ## same as –t c 0000000 a b c d e f g h i j k l \n 0000016 $ echo abc def ghi jkl | od -Ad -t d1 ## interpret each byte as decimal integer 0000000 97 98 99 32 100 101 102 32 103 104 105 32 106 107 108 10 0000016 $echo abc def ghi jkl | od -Ad -t x1 0000000 61 62 63 20 64 65 66 20 67 68 69 20 6a 6b 6c 0a 0000016 9

Disk space usage  report file system disk space usage df [OPTION]... [FILE]...  Show information about file system on which each FILE resides, or all file systems by default.  - estimate file space usage du [OPTION]... [FILE]...  Summarize disk usage of each FILE, recursively for directories.  quota - display disk usage and limits

10 Compare file contents  Compare files  cmp file1 file2: finds the first place where two files differ (in terms of line and character)  diff file1 file2: reports all lines that are different  diff’s output is carefully designed so that it can be used by other programs. For example, revision control systems use diff to manage the differences between successive versions of files under their management.  command: apply a diff file to an original patch [options] [originalfile [patchfile]] patch -pnum

11 File checksum  provide a single number, signature, that is characteristic of the file (computed from all of the bytes of the file)  Files with different contents is unlikely to have same checksum  Usage: Software announcements include of distribution files for user to tell whether a copy matches original.

12 openssl  a cryptography toolkit implementing Secure Sockets Layer and Transport Layer Security network protocols and related cryptography standards  openssl program: a command line tool for using various cryptography functions from shell.  Creation and management of private keys, public keys and parameters  Public key cryptographic operations  Creation of X.509 certificates, CSRs and CRLs  Calculation of Message Digests  Encryption and Decryption with Ciphers  SSL/TLS Client and Server Tests  Handling of S/MIME signed or encrypted mail  Stamp requests, generation and verification 13 Message digest openssl dgst [-md5|-md4|-md2|-sha1|-sha|-mdc2|- ripemd160|-dss1] [-c] [-d] [-hex] [-binary] [-out filename] [- sign filename] [-keyform arg] [-passin arg] [-verify filename] [-prverify filename] [-signature filename] [-hmac key] [file...] Or [md5|md4|md2|sha1|sha|mdc2|ripemd160] [-c] [-d] [file...]  Output message digest of a supplied file or files in hexadecimal form

14 Example $ /bin/l? 696a4fa5a98b81b066422a39204ffea4 /bin/ln cd6761364e3350d010c834ce11464779 /bin/lp 351f5eab0baa6eddae391f84d0a6c192 /bin/ls  Output: 32 hexadecimal digits, i.e., 128 bits.  chance of two different files with identical signatures is: 1/2128 (the book: 1/264)  In 2005, researchers were able to create pairs of PostScript documents and X.509 certificates with the same . Later that year, MD5's designer Ron Rivest wrote, "md5 and sha1 are both clearly broken (in terms of collision-resistance)."

15 public-key cryptography  Data security by two related keys: a private key, known only to its owner, and a public key, potentially known to anyone  Examples: RSA, DSA algorithms  Digital signature: Alice => Bob communication  If Alice wants to sign an letter, she uses her private key to encrypt it. Bob uses Alice’s public key to decrypt signed letter, and can then be confident that only Alice could have signed it, provided that she is trusted not to divulge her private key.  Secrecy:  If Alice wants to send a letter to Bob that only he can , she encrypts it with Bob’s public key, and he then uses his private key to decrypt it. As long as Bob keeps his private key secret, Alice can be confident that only Bob can read her letter.

16 Secure Software Distribution  many software archives include digital signatures that incorporate information from a file checksum as well as from signer’s private key.  how to verify such signatures ? $ ls -l coreutils-5.0.* ##Show the distribution files -rw-rw-r-- 1 jones devel 6020616 Apr 2 2003 coreutils-5.0.tar.gz -rw-rw-r-- 1 jones devel 65 Apr 2 2003 coreutils-5.0.tar.gz.sig $ gpg coreutils-5.0.tar.gz.sig ##Try to verify the signature gpg: Signature made Wed Apr 2 14:26:58 2003 MST using DSA key ID D333CBA1 gpg: Can't check signature: public key not found

17 Verify using public key  Obtain public key from public servers  Add the public key to your key ring $ gpg --import temp.key gpg: key D333CBA1: public key "Jim Meyering " imported gpg: Total number processed: 1 gpg: imported: 1  Verify the signature successfully: $ gpg coreutils-5.0.tar.gz.sig Verify the digital signature  Online resource: The GNU Privacy Handbook

18 Outlines  Finish up with awk: pipeline, external commands  Commands working with files  tree, ls and echo (-d option, -1 option, -R, -a)  od (octal dump), stat (show meta data of file), cmp, diff  touch command, mktemp, file with random bytes  File checksum, verification  locate, type, which, find command: Finding files  Process-related commands

19 touch: update modification time  Touch sometimes used to create empty files: their existence and possibly their timestamps, but not their contents, are significant.  a lock file to indicate that a program is already running, and that a second instance should not be started.  to record a file timestamp for later comparison with other files.  Example: $touch -t 197607040000.00 US-bicentennial $ ls -l US-bicentennial ##List the file -rw-rw-r-- 1 jones devel 0 Jul 4 1976 US-bicentennial $ touch -r US-bicentennial birthday #Copy timestamp to the new birthday file $ ls -l birthday ## List the new file -rw-rw-r-- 1 jones devel 0 Jul 4 1976 birthday

20 Temporary files  So far, we created in current directory  And remove it after using it  What if multiple scripts use same file name? or malicious users modify the files?  Special directories, /tmp (cleared when system reboots) and /var/tmp  To avoid filename collision, append process id as suffix ## create a temporary file in shell scripts tmpfile=temp.$$ ## $$ (process id) echo $tmpfile

21 mktemp command  mktemp: takes an optional filename template containing a string of trailing X characters, preferably least a dozen of them.  mktemp replaces them with an alphanumeric string derived from random numbers and process ID, creates the file with no access for group and other, and prints filename on standard output.

$ TMPFILE=`mktemp /tmp/myprog.XXXXXXXXXXXX` || 1 unique temporary file $ ls -l $TMPFILE List the temporary file -rw------1 jones devel 0 Mar 17 07:30 /tmp/myprog.hJmNZbq25727

22 Random bytes  two random pseudodevices: /dev/random and /dev/urandom.  These devices serve as never-empty streams of random bytes: such a data source is needed in many cryptographic and security applications.

23 Outlines  Finish up with awk: pipeline, external commands  Commands working with files  tree, ls and echo (-d option, -1 option, -R, -a)  od (octal dump), stat (show meta data of file), cmp, diff  File checksum, verification  touch command  temporary file, file with random bytes  locate, type, which, find command: Finding files

24 Search for files  locate: find files by name, using regularly updated database constructed by complete scans of the filesystem  locate [OPTION]... PATTERN... $locate  which: display full pathname for a command, using PATH variable $which rm rm='rm' /bin/rm  type: shell built-in command, how each name would be interpreted if used as a command name  -t option: report if a name is an alias, shell reserved word, function, builtin,

25 or disk file find command  find [ files-or-directories ] [ options ]: find files matching specified name patterns, or having given attributes. –atime n: Select files with access times of n days (-ctime, -mtime) –ls: Produce a listing similar to the ls long form, rather than just . –name 'pattern’ : select files matching the shell wildcard pattern (quoted to protect it from shell interpretation). –perm mask: select files matching the specified octal permission mask. –prune: do not descend recursively into directory trees. –size n: select files of size n. –type t: select files of type t,a single letter: d (directory), f (file),or l 26 (symbolic link). find: basic operations find [ files-or-directories ] [ options ]: files and directories to search Options: select names for ultimate (directories are (almost) always display or action descended into recursively)

 When it finds a file, it first carries out selection restrictions implied by options, and if those tests succeed, it hands the name off to internal action routine.  default action: print name on standard output,  –exec option: provides a command template into which name is substituted, and the command is then executed.

27 find usage examples  find: display all files/directory under current directory  find -ls: display files/directories in “ls” style  find * -prune  find $HOME/. ! -user $USER.  find -ls -type f -fprint /tmp/mytemp $find -ls -type f -fprint /tmp/mytemp 23724924 4 drwxr-xr-x 2 zhang staff 4096 Mar 25 22:40 . 23724925 0 --wx------1 zhang staff 0 Mar 25 22:35 ./a 23724927 0 -rw-r--r-- 1 zhang staff 0 Mar 25 22:35 ./b 23724928 4 -rw-r--r-- 1 zhang staff 10 Mar 25 22:40 ./tmp [zhang@storm testfind]$ /tmp/mytemp ./a ./b 28 ./tmp

find: examples  Files that haven’t been modified in the last year find . -mtime +365  Unsigned integer: exactly that many days old  Negative: than that absolute value  Positive: more than that value  Files that user has writing permission find . –perm -200 ## all bits set needs to match  permission mask as an octal string  Unsigned: an exact match on the permissions is required.  Negative: all of the bits set are required to match.  Positive: at least one of the bits set must match,  E.g., +700 //user can read, or , or execute …  Files that user does not have reading permission 29  find . ! –perm -400

Find: selector  selector options can be combined: all must match for the action to be taken.  interspersed with the –a (AND) option  –o (OR) option: at least one selector of the surrounding pair must match.  Find nonempty files smaller than 10 blocks (5120 bytes) $ find . -size +0 -a -size -10  Find files that are empty or unread in the past year $ find . -size 0 -o -atime +365

30 Usage of find in shell #!/bin/bash … ## go to level web site directory find . -name '*.html' -type f | ##Find all HTML files while read file ## Read filename into variable do echo $file ## Print progress $file $file.save ## Save a backup copy ##Make the change -f $HOME/html2xhtml.sed < $file.save > $file 31 done html2xhtml.sed  converts HTML to XHTML: converts tags to lowercase, and changes
tag into self-closing form,
: s/

/

/g Slash delimiter s/

/

/g s/

/

/g s/

/

/g HTML to XHTML, s/

/
/g standardized XML-based version of HTML s/
/
/g s:
::g Colon delimiter, slash in data s:::g .. s:::g s:::g 32 s:<[Bb][Rr]>:
:g

Total  $ find -ls | awk '{ += $7} END {("Total: %.0f bytes\n", Sum)}'  Total: 23079017 bytes

33 command  Supply the list returned by find as arguments to another command  Via shell’s feature. E.g., searching for symbol POSIX_OPEN_MAX in system header files: $ POSIX_OPEN_MAX /dev/null $(find /usr/include -type f | ) /usr/include/limits.h: #define _POSIX_OPEN_MAX 16  Note: why /dev/null here?  Potential problems: command line might exceed system limit => argument list too long error $getconf ARG_MAX ##sysget configuration values 2097152

34 Xargs command  xargs: takes a list of arguments from standard input, one per line, and feeds them in suitably sized groups (determined by ARG_MAX) to another command given as arguments to xargs.

$ find /usr/include -type f | xargs grep POSIX_OPEN_MAX /dev/null /usr/include/bits/posix1_lim.h:#define _POSIX_OPEN_MAX 16 /usr/include/bits/posix1_lim.h:#define _POSIX_FD_SETSIZE _POSIX_OPEN_MAX

35 Code Studies: filesdirectories

36 Summary  Finish up with awk: pipeline, external commands  Commands working with files  tree, ls (-d option, -1 option, -R, -a)  od (octal dump), stat (show meta data of file)  touch command, temporary file, file with random bytes  File checksum, verification  locate, type, which, find command: Finding files

37