Scripting For GIS Analysis

Snowbird, Utah

Tyler Cruickshank, Utah DAQ September 30, 2004 Tyler Cruickshank Utah Division of Air Quality [email protected] Q: What is Perl? A: A high level easy to use language. A: Uses best features of other languages. A: Great for file manipulation, text processing, web, …

Q: Why would I want to bother learning Perl? A: Eliminate manual processing through automation. A: Speed. A: Excel headaches.

Q: Isn’t programming only for “programmers”? A: ANYONE can use perl. A: No compiling. A: Great resources available. Q: How do I get perl? A: It is easy and fast.

- Perl can be used on a PC, MAC, UNIX, LINUX. - Likely already installed on UNIX & LINUX.

Unix / Linux www..org PC www.activestate.com Mac www.macperl.org

- For PC, download installation executable and run. Q: What does perl look like? A: A perl program/script is simply a text file.

#!/usr/bin/perl print “Do you want to know how to run perl? “; my $variable = ; if($variable eq ‘yes’){ print “Ok, type ‘perl myProgram.pl’ at the command line!\n”; } else{ print “Maybe next time …\n”; } Q: Examples ??? A: Process and transform multiple years of input files. - Raw emission input files need to be formatted for GIS input.

We Have: • 4 Years of data in five separate files. We Want: • 4 new formatted files. • Only NOX data. • Daily NOX data instead of annual NOX data. We Don’t want: • Spend hours manually creating new files with excel. • Errors because we did it manually. Q: 6 perl tools that will get you through anything. A: Focus on these 6 perl tools to complete our task. 1. Loop through the 4 files with “foreach”.

2. Open files for reading & writing with “filehandles”.

3. Read file contents with “while”.

4. Use and store file contents with “split”.

5. Test data with “if”.

6. Create a new file with “print”. Q: Perl “foreach” loops. A: Loop through a list of items. Foreach: Executes a loop for each item in your list.

• Handy for processing several years of data. • A list is defined as an array: @years = (2000,2002,2004,2006); • This list (array) has 4 years in it. • The foreach loop will execute exactly 4 times. • Each loop will use the current list item as a variable. Example: foreach $year (@years){ … do something … } Q: Perl “Filehandles”. A: Opening files for reading & writing.

Filehandle: A connection between your program and a file.

• We use 2 filehandles: 1 to open our read file. 1 to open a new file for writing. • We name our filehandle. • When we want to read or write we refer to the filehandle name. Example: open(IN,”:/data/fileIn.csv”) || die “Cant open IN;\n”; open(OUT,”>c:/data/fileOut.csv”) || die “Cant open OUT\n”; Q: Perl “while” loop. A: Read input file with a “while” loop.

While: Repeat until not true (ie. end of file).

• While loops are primarily for file reading.

• While loops loop through our file line by line.

• Each line of the file is treated as 1 long string.

Example: while(){ … read and process my entire file … } Salt Lake Valley Winter Inversions Q: Perl “split” function. A: “Split” each line into pieces. Split: Breaks a file string into pieces based on a seperator.

• Split is great for processing comma delimited .csv file excel data.

• Split makes our line by line string data useable.

• Split breaks our data into “columns” based on a seperator.

• Each “column” is stored in a list (array) for later access.

• A seperator can be anything (number, letter, puncuation).

Example: Some string = 2341, NOx, 100, 25.5 @data = split(/,/,some string); Q: Perl “split” function. A: “Split” each line into pieces. Split: Breaks a file string into pieces based on a seperator.

2341, NOx, 100, 25.5 O 2341, NOx, 2500,100 n e Example Input File l In Data: 2678, SO2, 15, 1100 in e at a 2679, NOx, 10, 100 t im e. @data = split(/,/,$_);

$data[0] $data[1] $data[2] $data[3] Out Data: 2341 NOx 100 25 First Loop 2341 NOx 2500 100 Second Loop 2678 SO2 15 1100 Third Loop 2679 NOx 10 100 Fourth Loop

• Each loop creates a new list “array” for us to work with. Q: Perl “if” test. A: The perl “if” is a test on a condition. If: The perl “if” is a test on a condition.

• If allows us to filter out data or choose specific data.

• If statements can be simple or complex.

• If statements can contain multiple conditions.

Example: if($year == 2004){ … do something … } else{ … do something else … } Q: Perl “print” function. A: The perl “print” writes to the screen or a file. If: The perl “print” creates new files.

• Print to the screen or print to a file handle. - Print to a filehandle via the filehandle name.

• Formatted printing is allowed.

Example: print “The year is $year\n”; print OUT “The year is $year\n”; Q: How do I use these tools in a program? A: For PC, in DOS window type “perl myProgram”. #!/usr/bin/perl Example Input File my @years = (2000,2002,2004,2006); 2341, NOx, 100, 25.5 foreach my $year (@years){ # Loop thru each year. 2341, NOx, 2500,100 open(IN,”c:\data\area.$year”); 2678, SO2, 15, 1100 2679, NOx, 10, 100 open(OUT,”>c:\data\areaOut.$year”); Split Function while(){ # Loop thru each line of file. my @data = split(/,/,$_); # Split on comma. @data –First Line $data[0] = 2341 if($data[1] eq ‘NOx’){ # Get only NOX data. $data[1] = NOx my $dayTons = $data[2]/365; $data[2] = 100 $data[3] = 25.5 print OUT “$data[0],$data[1],$dayTons\n”; } # End if. @data – Second Line $data[0] = 2341 } # End while. $data[1] = NOx } # End foreach. $data[2] = 2500 $data[3] = 100 Monte Cristo Pk (11,132’). Across The Street From Snowbird The Pfeifferhorn (11,352’). Behind The Cliff Lodge. Q: What are some other examples of perl applications? A: Lets look at 5 of my favorites:

1. Data processing from excel files. 2. Batch files or repetitive tasks. 3. Preparing GIS input files. 4. Processing NetCDF files. 5. Create graphics from web accessed data. Q: What are perl modules? A: 100’s of perl modules exist for complex tasks. The Comprehensive Perl Archive Network (CPAN) A collection of perl software and documentation.

• Modules contain pre-built functions ready for use.

• Modules exist for many applications including: WWW Database interfaces HTML Images/graphing Mail NetCDF PDL (Perl Data Language) extension www.cpan.org Q: How to efficiently work with NetCDF files? A: Try PDL and the PDL NetCDF module.

• PDL (Perl Data Language) is a perl extension. -Allows fast storage and manipulation of large n-dimensional datasets.

• NetCDF module contains functions for writing and reading NetCDF files. -Read/write to and from PDL data arrays.

http://pdl.perl.org Q: 3 Basic steps for NetCDF processing. A: NetCDF files can be big. Q: 3 Basic steps for NetCDF processing. A: Use perl to extract and reformat NetCDF data for a GIS grid (asciigrid command) Q: 3 Basic steps for NetCDF processing? A: Create GIS maps with grid data. Q: 3 Basic steps for NetCDF processing? A: PDL & NetCDF.

1) Open a NetCDF file for reading (or writing). 2) Request specific data from the open NetCDF file. 3) Return requested data to a PDL array.

Example:

Open file … $file = PDL::NetCDF->new($filename)

Request and return data to PDL array … $pdlArray = $file->get($var,[$hr,0,0,0],[1,$layer,$row,$col]);

Now, more processing can be done on the array.

http://pdl.perl.org PERL Summary • Don’t fear it. • Reduce manual efforts. • Try the 6 tools on a simple task. • Start small and simple. • Explore perl modules. • Pick up a perl book and use google for perl help. • Impress colleagues with speed – Relax! Suggested Resources:

Books: O’Reilly Publishers (www.perl.com): Learning Perl Perl Cookbook

Websites: www.cpan.org www.perlmonks.org Search “perl tutorial” using google. On his way to learn perl …..