Scripting Languages Awk

Higher-level languages. Named for its three developers:

Scripting Languages More compact for small programs. Alfred Aho

COMS W4115 Often not suitable for large systems. Peter Weinberger Prof. Stephen A. Edwards The more popular ones: Fall 2004 Awk Good for data file manipulation. I use it for computing grades and other simple tasks. Department of Python Tcl

Simple Awk Program Simple Awk Program Awk Program Structure

Input file. Each line a record. Space-separated fields: Beth 10.0 0 pattern { action } employee, pay rate, hours worked Kathy 14.0 10 pattern { action } Beth 10.0 0 . Dan 9.75 0 3rd field . Kathy 14.0 10 z}|{ $3 > 0 { print $1, $2 * $3 } scans an input file one line at a time and, in order, Mark 10.0 20 | {z } | {z } Susie 8.25 18 runs each action whose pattern matches. pattern action Run on the awk program Patterns: Kathy 140 BEGIN, END True before and after the file. $3 > 0 { print $1, $2 * $3 } expression Condition produces // String pattern match pattern && pattern Kathy 140 pattern || pattern Boolean operators Mark 200 ! Susie 148.5 pattern

Awk One-Liners Statistics in Awk Associative Arrays: Word Counter

Print every line #!/bin/awk -f { gsub(/[.,:;!(){}]/, "") # remove punctuation BEGIN { n = 0; s = 0; ss = 0;} for ( i = 1 ; i <= NF ; i++ ) { print } NF == 1 { n++; s += $1; ss += $1 * $1; } count[$i]++ } Print the first and third fields of each line END { print n " data points" END { for (w in count) { print $1, $3 } m = (s+0.0) / n; print m " average" print count[w], w | "sort -rn" } sd = sqrt( (ss - n * m * m) / ( n - 1.0)) Print every line with three fields print sd " standard deviation" Run on the Tiger reference manual produces } NF == 3 { print } 103 the 1 58 of 5 51 is 6 data points Print a line number before every line 10 49 and Run on gives 6.16667 average 49 a 3 3.92003 standard deviation 35 expression { print NR, $0 } 7 32 The 11 29 = Perl Wordcount in Perl Understandable wordcount in Perl

Larry Wall’s #!/usr/bin/perl #!/usr/bin/perl Practical Extraction and Report Language while(<>) { while($line = <>) { or chop; chop($line); s/[.,:;!(){}]//g; $line =˜ s/[.,:;!(){}]//g; Pathologically Eclectic Rubbish Lister @words = split; @words = split(/\s+/, $line); Larger, more flexible language than Awk. Good for text foreach (@words) { foreach $word (@words) { processing and other tasks. Strange . Henious $count{$_}++; $count{$word}++; syntax. } } } } Excellent regular-expression support. More complicated open(SORTER, "| sort -nr"); open(SORTER, "| sort -nr"); data structures possible (even classes). foreach (keys %count) { foreach $word (keys %count) { print SORTER print SORTER $count{$_}, " ", $_,"\n"; $count{$word}, " ", $word,"\n"; } }

“There’s more than one way to do it” So Why Perl? Python

Perhaps too many. Equivalent ways to print STDIN: Perhaps the most popular scripting language. Perl designed by a sane man. while () { print; } Despite its flaws, it’s very powerful. Very clean syntax and semantics. print while print while <> Almost has a good type system. Large collection of libraries (but not as big as Perl’s). while (defined(($_ = )) { print $_; } Very few things can't be done in Perl. for (;;) { print; } Regular expression support (but not as integrated as print $_ while defined($_ = ); Fast, flexible interpreter. Perl’s.) Many Perl statements come in prefix and postfix form Ability to virtually every system call. Binary while (...) ... data manipulation. ... while ... Ported everywhere. if (...) ... Very, very extensive collection of libraries. Database ... if ... access. CGI/HTML for the web. Math. IPC. Time. ... unless ...

Wordcount in Python Python Classes Python’s Merits

#!/usr/bin/env python class Complex: Good support for programming-in-the-large: def __init__(self, realpart, imagpart): self.r = realpart Packages with separate namespaces; Exceptions; import fileinput, re, string, os self.i = imagpart Classes def add(self, a): count = {} self.r = self.r + a.r for line in fileinput.input(): self.i = self.i + a.i Persistent datastructures (pickling) line = re.sub(r’[.,:;!(){}]’,"",line) def p(self): High-level: lists, strings, associative arrays, iterators for word in string.split(line): print "%g + %gi" % (self.r,self.i) if not count.has_key(word): Good collection of libraries: count[word] = 1 x = Complex(1,2) y = Complex(2,3) else: x.p() Operating-system access (files, directories, etc.); count[word] = count[word] + 1 x.add(y) x.p() String manipulation; Curses; Databases; Networking (CGI, HTTP, URL, mail/Mime, HTML); Tk; f = os.popen("sort -nr",’w’) Prints for word in count.keys(): Cryptography; System-specific (Windows, Mac, SGI, f.write(’%d %s\n’ % (count[word], word) ) 1 + 2i 3 + 5i POSIX) Python vs. Perl Tcl Tcl Syntax

Python can be the more verbose language, but Perl can John Ousterhout’s Tool Command Language was Shell-like command syntax: be cryptic. originally intended to be grafted on to an application to command argument argument . . . make it controllable. Regular expression support more integrated with All data is strings (incl. numbers and lists) language in Perl. Since become a general-purpose scripting language. Its Macro-like variable substitution: syntax is quite simple, although rather atypical for a Perl better-known. . set foo "123 abc" Probably comparable execution speeds. bar 1 $foo 3 Tk, a Tcl package, provide graphical user interface More “tricks” possible in Perl; Python more disciplined. widgets. Tcl/Tk may be the easiest way to write a GUI. Command substitution: Python has the much cleaner syntax and semantics; I Tk has been connected to Perl and Python as well. set foo 1 know which language’s programs I’d rather maintain. set bar 2 puts [eval $foo + $bar]; # Print 3

Wordcount in Tcl Nifty Tcl Features Tk

#!/usr/bin/env tclsh Associative arrays “Hello World” in Tk. while {[gets stdin line] >= 0} { set count(Stephen) 1 button .b -text "Hello World" -command "exit" regsub -all {[.,:;!(){}]} $line "" line pack .b foreach word $line { Lists if {![info exists count($word)]} { set count($word) 1 lappend foo 1 } else { incr count($word) lappend foo 2 } foreach i $foo { puts $i } ; # print 1 then 2 } } Procedures set f [open "| sort -rn" w] proc sum3 {a b } { foreach word [array names count] { return [expr $a + $b + $c] puts $f "$count($word) $word" } }

An Editable Graph An Editable Graph An Editable Graph

# Set up the main window # Set up bottom control buttons set w .plot frame $w.buttons catch destroy $w pack $w.buttons -side bottom -fill x -pady 2m toplevel $w button $w.buttons.dismiss -text Dismiss -command "destroy $w" wm title $w "Plot Demonstration" button $w.buttons.code -text "See Code" -command "showCode $w" pack $w.buttons.dismiss $w.buttons.code -side left -expand 1 wm iconname $w "Plot" positionWindow $w # Set up graph itself set c $w.c canvas $c -relief raised -width 450 -height 300 pack $w.c -side top -fill x # Text description at top label $w.msg -font $font -wraplength 4i -justify left \ # Draw axes -text "This window displays a canvas widget containing set plotFont Helvetica 18 $c create line 100 250 400 250 -width 2 a simple 2-dimensional plot. You can doctor the data $c create line 100 250 100 50 -width 2 by dragging any of the points with mouse button 1." $c create text 225 20 -text "A Simple Plot" -font $plotFont \ pack $w.msg -side top -fill brown An Editable Graph An Editable Graph Bourne Shell

# Draw axis labels # Bind actions to events Default shell on most Unix systems (sh or ). for {set i 0} {$i <= 10} {incr i} { $c bind point "$c itemconfig current -fill red" set x [expr {100 + ($i*30)}] $c bind point "$c itemconfig current -fill SkyBlue2" $c create line $x 250 $x 245 -width 2 $c bind point <1> "plotDown $c %x %y" Good for writing “shell scripts:” parsing command-line $c create text $x 254 -text [expr 10*$i] \ $c bind point "$c dtag selected" -anchor n -font $plotFont bind $c "plotMove $c %x %y" arguments, invoking and controlling other commands, etc. } set plot(lastX) 0 for {set i 0} {$i <= 5} {incr i} { set plot(lastY) 0 Example: The cc command. set y [expr {250 - ($i*40)}] proc plotDown {w x y} { # Called when point clicked $c create line 100 $y 105 $y -width 2 global plot $c create text 96 $y -text [expr $i*50].0 \ $w dtag selected Most C built from four pieces: -anchor e -font $plotFont $w addtag selected withtag current } $w raise current Preprocessor (cpp) # Draw points set plot(lastX) $x foreach point {{12 56} {20 94} {33 98} {32 120} {61 180} set plot(lastY) $y {75 160} {98 223}} { } Actual (cc1) set x [expr {100 + (3*[lindex $point 0])}] proc plotMove {w x y} { # Called when point dragged set y [expr {250 - (4*[lindex $point 1])/5}] global plot set item [$c create oval [expr $x-6] [expr $y-6] \ $w move selected [expr $x-$plot(lastX)] \ Assembler (as) [expr $x+6] [expr $y+6] -width 1 -outline black \ [expr $y-$plot(lastY)] -fill SkyBlue2] set plot(lastX) $x Linker (ld) $c addtag point withtag $item set plot(lastY) $y } }

cc in sh cc in sh cc in sh

#!/bin/sh # Parse command-line options # Parse ®lenames while [ ! -z "$1" ]; while [ ! -z "$1" ]; do # Set up command names do case x"$1" in case x"$1" in root=/usr/lib x-v) echo "Stephen's cc 1.0"; exit 0 ;; x-o) shift; outfile=$1 ;; x*.c) cfiles="$cfiles $1" ;; cpp=$root/cpp x-c) stopafterassemble=1 ;; x*.s) sfiles="$sfiles $1" ;; cc1=$root/cc1 x-S) stopaftercompile=1 ;; x*.o | x*.a) ofiles="$ofiles $1" ;; as=/usr/bin/as x-E) stopafterpreprocess=1 ;; *) echo "Unrecognized file type $1" 1>&2; exit 1 ;; ld=/usr/bin/ld x-*) echo "Unknown option $1" 1>&2; usage ;; esac *) break ;; shift esac # Complaint function shift done usage() { done echo "usage: $0 [options] files ..." 1>&2 # Run preprocessor standalone exit 1 # Initialize lists of ®les to process if [ "$stopafterpreprocess" ]; then } cfiles="" for file in $cfiles; do sfiles="" $cpp $file ofiles="crt1.o" # Default output ®lename done outfile="a.out" if [ $# = 0 ]; then exit 0 echo "$0: No input files" 1>&2; exit 1 fi fi

cc in sh Scripting Languages Compared What To Use When

# Preprocess and compile to assembly awk Perl Python Tcl sh awk: Best for simple text-processing (file of fields) for file in $cfiles; do asmfile=`echo $file | sed s/.c$/.s/` Shell-like N N N Y Y Perl: Best for legacy things, things requiring regexps $cpp $file | $cc1 > $asmfile Reg. Exp. B A C C D sfiles="$sfiles $asmfile" Python: Best all-around, especially for large programs done Types C B A B D if [ "$stopaftercompile" ]; then exit 0; fi Structure C B A B C Tcl: Best for command languages, GUIs # Assemble object ®les Syntax B F A B C for file in $sfiles; do sh: Best for portable “invoking” scripts objfile=`echo $file | sed s/.s$/.o/` Semantics A C A B B $as -o $objfile $file ofiles="$ofiles $objfile" Speed B A A B C done Libraries C A A B C if [ "$stopafterassemble" ]; then exit 0; fi Power B A A B C # Link to build executable Verbosity B A C C B $ld -o $outfile $ofiles exit 0