<<

Introduction to Python

GPS Resource Seminar Eh Tan Nov. 17, 2005

What is Python?

• Python is a portable, interpreted, interactive, extensible, object-oriented . o Portable: run on Linux, many types of Unix, Windows, Mac, OS/2, … Comment: Python is a standard package and Interpreted: no compiling and linking, fast edit-test-debug cycle is an administrative tool in many linux o distributions o Interactive: test while you code Comment: Development in python is 5-10x o Extensible: interface with /C++ faster than in C/C++ • Often compared to Tcl, , or Java. Comment: Forget the syntax? Unsure about • Clear and elegant syntax -- easy to learn and read the outcome of a function? Test it in the • Powerful -- suitable for all kinds of work interpreter o sourceforge.net language statistics: Comment: With simple wrappers, python . C++: 16544 projects can call C/C++ functions . Java: 16473 projects Comment: Readability of code is very important when maintaining your own or . C: 15772 projects other’s code . PHP: 11970 projects . Perl: 6155 projects . Python: 4457 projects . … . Fortran: 165 projects • Current version: 2.4 Comment: New features are added into python in every new version. If I am using any new features introduced after v2.3, I will put a remark. Features of Python • Automatic memory management • Strong and dynamic typing • High level data structures (list and dict) • Strong support for string operation (including regular expression) • Namespaces • Modular design • Supporting procedural, functional, and objected-oriented programming • Interfaces to system calls and libraries • Interfacces to various GUI systems (GTK, Qt, Motif, Tk, Mac, MFC, wxWidgets).

When to use Python?

• Almost every situation, except: o Simple script for running a batch of commands (use Shell script instead) o Serious number crunching (use Fortan, C instead). However, there are python extension modules that can give you the speed of C! . Take a look at SciPy (numeric.scipy.org) and ScientificPython (starship.python.net/~hinsen/ScientificPython) o Plotting and visualization (use Matlab, IDL, GMT instead) . matplotlib (matplotlib.sourceforge.net) . MayaVi (mayavi.sourceforge.net) . ParaView (www.paraview.org)

Resources

• Online resources o python interpreter online help >>> help(foo) Comment: '>>>' is the prompt of python (print out the documentation of foo) interpreter >>> dir(foo) (print out the methods of foo) o www.python.org/doc contains many links to tutorials, ebooks, and references. The python tutorial (docs.python.org/tut/tut.html) is the official tutorial (and is always up to date). o diveintopython.org has a good tutorial for experienced programmer (but new to python) • Books o Core Python Programming, W. Chun, Prentice Hall – good for beginner and advanced programmer (out of print) o Learning Python, M. Lutz and D. Ascher, O'Reilly & Associates – good beginner’s book o Python in a Nutshell, A. Martelli, O'Reilly & Associates – good for experienced programmers and as a language reference o Python Essential Reference, D. Beazley, New Riders – desktop reference

Python Editors • Emacs or XEmacs with python-mode • Vi or Vim • Integrate Development Environment: IDLE, IPython, ActivePython

Programming in Python

• Case sensitive • # starts comment • Indentation matters if t > 0: Comment: Notice that there is no positive = True parenthesis around the condition and the “:” if verbose: after the condition and “else” print “t is positive” Comment: True and False are python else: constants positive = False Comment: This “else” follows the first “if” • Line continuation o Trailing backslash s = 1 + \ 2 o (…), […], {…} can span multiple lines s = (1 + 2) • No variable declaration • Use of undefined variable is an error • Variables are references (like C pointers) of objects Comment: Many C/Fortran programmers >>> a = b = 5 get confused at this point. Read the following >>> b = 4 example carefully. >>> print a, b Comment: a and b are both references of 5 5 4 Comment: b is changed to be a reference of >>> a = b = [1, 2] 4, but a is unchanged >>> b[0] = 0 >>> print a, b Comment: a and b are both references of a [0, 2] [0, 2] list (a list is like an array) Comment: The first element of the array is changed to 0 print Comment: Both a and b sees the change • Output to stdout >>> print 123 123 • Add new line character in the end (suppressed by a trailing comma) >>> for i in [1,2,3]: print i 1 2 3 >>> for i in [1,2,3]: print i, 1 2 3 • Add space between each element >>> print 1,2,3 1 2 3 • Redirection >>> print >> f, 123 Comment: f is a file/stream object (discussed later)

Numbers • int o Decimal, octal, hexadecimal 12 == 014, 12 == 0xC o No limit on # of digits 10**100**2 o Conversion int(12.6), int(‘12’) Comment: float-to-int conversion is • float truncated, or rounded toward the floor. o Double precision Comment: There is no single-precision float 12.0, 1.2e1 in python. The only place you will need it is Conversion during I/O. In this case, use struct or array o modules (discussed later). float(12), float(‘1.2e1’) • complex o Append with ‘j’ 3+4j o Conversion complex(‘3+4j’), complex(3,4) • Operators: +, -, *, /, //, %, ** o Division >>> print 3 / 5 0 Comment: int/int returns int, rounded >>> print 3.0 / 5 toward the floor 0. 6 Comment: float/int or int/float returns float >>> print 3.0 // 5 0.0 Comment: // always returns “integer value”

>>> from __future__ import division Comment: In future (python 3.0), true >>> print 3 / 5 division will become the default, and this line 0.6 will not be necessary >>> print 3.0 / 5 0.6 >>> print 3.0 // 5 0.0

str • Single or double quotes ‘abc’, “abc”, “I don’t”, ‘He said “No”’ • Triple-quotes for multi-line string ‘’’This is the first line the second the third’’’ • Conversion Comment: Every object, not just numbers, str(12), str(3.2) can be converted to string • Escape characters Comment: To print a float with specified >>> print ‘abc\ndef’ digits, see “String Formatting” section below. abc def >>> print ‘abc\’def’ abc’def >>> print ‘abc\\def’ abc\def • Operators: +, *, in >>> print ‘abc’ + ‘def’ abcdef >>> print ‘abc’*3 abcabcabc >>> print ‘a’ in ‘abc’ Comment: 'a' is a string of length 1. There is True no char type in python.

str Methods • Query on the nature of characters: isspace(), isalpha(), isdigit(), isalnum() • Case manipulation: lower(), upper(), swapcase(), title() • Whitespace removal: strip(), lstrip(), rstrip(), split(), splitlines() • Justification: ljust(), rjust(), center(), zfill() • Search: count(), find(), rfind(), index(), rindex(), startswith(), endswith() • Search and replace: replace(), translate() • Joining a list of string: join() greeting = ‘Hello’ + ‘ ‘ + ‘Mr.’ + ‘ ‘ + ‘Tan’ greeting = ‘ ‘.join([‘Hello’, ‘Mr.’, ‘Tan’]) Comment: Both methods give the same result, but the 2nd method is faster. list • [elem1, …] • Useful function: range() >>> print range(3) Comment: Integers between [0, 3) [0, 1, 2] >>> print range(1,3) Comment: Integers between [1, 3) [1, 2] >>> print range(1,6,2) Comment: Integers between [1, 6) with [1, 3, 5] stride 2 • Like a C array, but can contain different types of elements >>> alist = [1, 3.5, ‘a’, [1,2,3]] • Access by index, index starts from 0 >>> print alist[2] ‘a’ • Negative index >>> print alist[-1] Comment: The last element. Think of alist[- [1, 2, 3] 1] == alist[n-1], where n = len(alist) >>> print alist[-2] ‘a’ • Index slice >>> print alist[0:2] Comment: [0, 2) [1, 3.5] >>> print alist[::2] Comment: [0, len(alist)) with stride=2 [1, ‘a’] >>> print alist[-2:] Comment: The last 2 elements. [n-2, n), [‘a’, [1, 2, 3]] n=len(alist) • Operators: +, *, in, del >>> del alist[0] >>> print alist [3.5, ‘a’, [1, 2, 3]]

list Comprehension • A convenient way to generate a list • For example, a list of even integer that is less than 10, in math formula: s = { x | x ∈ [0, 1, 2, …, 9], (x%2) == 0 } Similarly, in python: s = [ x for x in range(10) if (x%2) == 0 ] • Another example, a list of numbers that is the square root of all less-than-10 integers, and converts each number to string s = [ str(sqrt(x)) for x in range(10) ] Comment: Need to import math module to call sqrt() (discussed later) list Methods • Add element: append(), insert(), extend() • Remove element: remove(), pop() • Search: count(), index() • Sort: reverse(), sort()

tuple • (elem1, …) • Single-element tuple >>> print (1) Comment: This creates an integer 1, not a 1 tuple. The () are interpreted as arithmatic >>> print (1,) grouping. (1,) Comment: The trailing comma indicates we • Similar to list, but immutable want a single-element tuple >>> atuple = (1, 2, ‘a’) Comment: String can be consider as a >>> print atuple[1] special case of tuple, which contains characters 1 only. String, list, and tuple are typical sequence >>> atuple[2] = 3 object. Traceback (most recent call last): File "", line 1, in ? Comment: This line shows the location of TypeError: object does not support item assignment the error • Tuple unpacking Comment: This line shows the category of >>> a, b, c = (1, 2, 3) the error and a detailed error message >>> print a, b, c 1 2 3 • Operators: +, *, in

dict • {key1: value1, …} • Access by key, like map in C++ or associative array in Perl >>> adict = { ‘name’: ‘Eh’ ‘office’: ‘358 SM’ } >>> print addict[‘name’] Eh • Elements are not ordered, cannot access by index • Operators: in, del >>> print ‘name’ in adict True >>> print ‘Eh’ in adict Comment: ‘in’ operator checks the key, not False the value of the dict >>> del adict[‘name’] >>> print adict {‘office’: ‘358 SM’}

dict Methods • Content: keys(), values(), items(), popitem() • Fail-safe access: get(), setdefault(), pop() • Whole dict operation: copy(), clear(), update()

String Formatting • Syntax as the sprintf() function in C o %d: integer o %f, %e, %g: float o %s: string, or any object (converted to string automatically) • format_string % tuple Comment: Matched by position >>> print ‘%s is about %5.3f' % (‘PI’, 3.1415926) PI is about 3.142 • format_string % dict Comment: Matched by key >>> proj = ‘X6’; xmin = 0; xmax = 1; ymin = 2; ymax = 3 >>> psfile = ‘test.ps’ >>> print ‘psbasemap -J%(proj)s \ -R%(xmin)d/%(xmax)d/%(ymin)d/%(ymax)d >> \ %(psfile)s’ % vars() Comment: vars() is a dict containing all psbasemap -JX6 –R0/1/2/3 >> test.ps currently defined variable:value pairs

Modules and import • Modules contain externally defined constant/function/class • Every module has a name, stored at __name__ Comment: __name__ is the filename of the Modules must be imported before use module (.py removed). The __name__ of • python interpreter is ‘__main__’ • import module import math s = math.sin(0.5) • import module as alias import math as m s = m.sin(0.5) • from module import name as alias from math import sin s = sin(0.5) • Module search path: sys.path (derived from environment variable PYTHONPATH)

Comparison Operators • Boolean constants: True, False • Another constant: None • ==, !=, <, <=, >, >= >>> print 0 == 0.0 True • is, not, and, or >>> print 0 is 0.0 False • Every object is evaluated as True, except 0, 0.0, None, empty sequences, and empty dict

if • if cond1: … elif cond2: … else: … if a > 0: print ‘%f is positive’ % a elif a < 0: print ‘%f is negative’ % a else: print ‘%f is zero’ % a • No ‘case’ or ‘switch’ statement

for • for var in seq: … for filename in [‘a.txt’, ‘b.txt’]: do_something(filename) for i in range(10): do_something(i) • Don’t modify the sequence being iterated over in the loop

while • while cond: … found = False while not found: found, result = some_search_func() print result

Other Control Flow Statement • Do nothing: pass • Stop and exit the inner most loop: break • Continue with the next iteration of the loop: continue • for-loop and while-loop can have else block, which is executed if the loop is not terminated by a break >>> for n in range(2, 10): ... for x in range(2, n): ... if n % x == 0: ... print n, 'equals', x, '*', n/x ... break ... else: ... # loop fell through without finding a factor ... print n, 'is a prime number' ... 2 is a prime number 3 is a prime number 4 equals 2 * 2 5 is a prime number 6 equals 2 * 3 7 is a prime number 8 equals 2 * 4 9 equals 3 * 3

Errors and Exceptions • Errors are detected by python and can be handled by users • The traceback shows where is the erroneous line >>> '2' + 2 Traceback (most recent call last): File "", line 1, in ? Comment: This line shows the location of TypeError: cannot concatenate 'str' and 'int' objects error. Because the error occurs during an • try: … except exception: … else: … interactive session, the filename is "", try: the line number is 1, the function name is a + ‘s’ unknown. If the error occurs during the function call, the location of the error and the except TypeError: where the function is called will be shown in th print ‘a is not a string’ traceback. else: some_string_function(a) Comment: A bare except will catch any • try: … except exception: … finally: … excption. f = open(filename, ‘w’) Comment: The else block will be executed try: if the try block doesn't trigger any exception. for i in seq: Comment: Testing whether or not variable a f.write(i) is a string finally: Comment: The finally block will be f.close() executed whether or not an exception has • raise exception, error_msg occured in try block. It is useful for clean-up found, result = some_search_func() action. if not found: Comment: Open a file for writing raise Exception, ‘target not found’ Comment: f is always closed (and buffer flushed) when the code exits

Comment: Exception is a python built-in I/O exception • open(filename, access_mode) Comment: file() is an alias of open() f = open(‘table.txt’) Comment: Default access mode is to read for line in f: text file process(line) Comment: f can behaves like a list of lines • Access mode o read (default): ‘r’ o overwrite: ‘w’ o append: ‘a’ o read+write: ‘r+’ o binary: ‘b’ • System I/O: sys.stdin, sys.stdout, sys.stderr • Wrap important file operation in a try-finally block

Fileobject methods • Close: close() • Read: read(), readline(), readlines() • Write: write(), writelines() • Position: seek(), tell()

Command Line • raw_input() returns keyboard input as a string age = int(raw_input(‘Enter your age:’)) • Command line argument is stored in a list: sys.argv[] Comment: sys.argv[0] is the filename of the invoked script

Function • def funcname(arg1, arg2, …): … >>> def add(arg1, arg2, arg3): ... return arg1 + arg2 + arg3 >>> add(1,2,3) 6 >>> add(‘1’,’2’,’3’) Comment: As long as the arguments support ‘123’ ‘+’ operator, add() will work • If more than one items need to be returned, return a tuple >>> def random2(): ... import random ... return (random.random(), random.random()) Comment: The outmost () is optional ... >>> a, b = random2() • When there is no return, None is returned >>> def change(i) ... i = i+1 ... >>> k = change(1) >>> print k None • When an argument of immutable type (e.g. numbers, string, tuple) is modified in the function, a new local copy is created, and the caller is not affected >>> i = 1 >>> change(i) >>> print i 1 >>> def change2(i) ... return i+1 ... >>> i = change2(i) >>> print i 2 • When an argument of mutable type (e.g. list, dict) is modified in the function, the caller will see the modification. This is usually unintended and bad. If the caller needs the modification, the function should make a copy of the original argument, modify the copied one, then return the modification >>> def change3(i) ... i.append(3) Comment: i is modified, a new element (3) ... is appended >>> k = [1, 2] >>> change3(k) >>> print k [1, 2, 3] >>> def change4(i) ... return i+[4] Comment: A new list is created and returned ... >>> m = change4(k) >>> print k [1, 2, 3] >>> print m [1, 2, 3, 4] • Default argument values >>> def add(arg1, arg2, arg3=’3’): ... return arg1 + arg2 + arg3 >>> add(‘1’,’2’) ‘123’ • Keyword arguments >>> add(arg2=’2’, arg1=’1’) ‘123’ • Arbitrary arguments >>> def add(arg1, *args): ... result = arg1 ... for i in args: ... result += i ... return result >>> add(1,2,3,4,5) 14 >>> add(1) 1 >>> add(‘1’,’2’,’3’,’4’,’5’) ‘12345’ Anonymous function: lambda • Comment: This is not a typical usage of >>> add = lambda arg1, arg2: arg1+arg2 lambda, which is often used when you need a >>> add(1,2) simple function but don’t want to give it a 3 name. This kind of anonymous function is useful in functional programming

Class • class classname(parentclass): ... >>> class person(object): ... def __init__(self, name): ... self.name = name ... def greeting(self): ... print ‘Hello’, self.name ... >>> tan2 = person(‘Eh’) >>> sarah = person(‘Sarah’) >>> tan2.greeting() Hello Eh >>> sarah.greeting() Hello Sarah • Most of the time, parentclass is object Comment: object is a python built-in class. • __init__() is the constructor Every python entity inherits from object • Usually no need of destructor • Defining a class method is similar to defining a function, except the first argument is (by convention) called ‘self’

Built-in Functions • Sequence related: len(), range() • Number related: abs(), min(), max(), sum() • Class hierarchy: isinstance(), issubclass() • Namespace: vars(), locals(), globals()

Standard Library • sys -- Python environment related operations o argv[]: command line arguments o exit(): exit from python o path[]: search path for importing modules, can be customized by env. variable PYTHONPATH o stdin, stdout, stderr: system defined I/O streams • os -- OS related operations o environ{}: environment variables o system(): execute an external command o popen(): open a pipe to or from a command • os.path -- OS-independent pathname operations o dirname(): directory name o basename(): base name o exists(): testing existence of pathname o getsize(): get file size • glob -- Unix-style pathname pattern expansion • math -- Mathematical functions • cmath -- Mathematical functions for complex numbers • array -- Array of homogeneous data (optimized for speed and memory) • struct -- Array of heterogeneous data (optimized for speed) • random -- Random number generator • re -- Regular expression

Example

• Reproduce part of the behaviors of ‘gmtconvert’ command o Usage: gmtconvert filename –bi[s] o Read in a binary file as double-precision (or single-precision if ‘s’ is appended after ‘-bi’) float, and print the content in n columns to stdout • The script and example data can be found at Datalib. Try the following commands: o myconvert.py single.myconvert –bis1 o myconvert.py single.myconvert –bis2 o myconvert.py double.myconvert –bi2 • myconvert.py:

#!/usr/bin/env python Comment: This line tells the shell where to '''Usage: myconvert.py filename -bi[s] find the python interpreter.

Read in a binary file as double-precision (or single-precision if 's' is appended after '-bi') float, and print the content in n columns to stdout ''' Comment: If the first line of the code is a string, this string is stored in a variable called ‘__doc__’ and is called docstring. By def get_arg(argv): convention, this string should be the ''' Parsing command line argument''' documentation of the code.

# wrap the argument parsing in a try block so that, # in case of error, the error can be captured and # the correct syntax of usage can be printed try: # the first argument is the input filename filename = argv[1]

# the second argument must start with '-bi' arg = argv[2] if not arg.startswith('-bi'): raise Exception, 'syntax error'

# remove '-bi' from arg arg = arg[3:]

# if the 1st char is 's', the data is single precision, # otherwise, the data is double precision if arg.startswith('s'): precision = 'single' # remove 's' arg = arg[1:] else: precision = 'double'

# the rest of arg must be an positive integer, indicating # the number of columns columns = int(arg) if columns <= 0: raise Exception, 'syntax error'

except: # if there is any exception, the command line arguments # are incorrect, print the correct syntax and exit print __doc__ Comment: Print the docstring import sys sys.exit(1)

return filename, precision, columns Comment: Multiple items are returned

def main(filename, precision, col): # count how many elements are in the file size = count_size(filename, precision)

# sanity check if size % col: raise Exception, 'file size not dividible by columns'

# open the file in binary mode for reading f = open(filename, 'rb')

# read the file elements = readfile(f, precision, size)

# print the elements in col columns print_n_columns(elements, col)

return

def count_size(filename, precision): '''count how many elements of data are in the file'''

# get file size (in byte) import os bytes = os.path.getsize(filename)

# from bytes to # of elements # each single takes 4 bytes # each double takes 8 bytes if precision == 'single': size = bytes / 4 else: size = bytes / 8

return size

def readfile(f, precision, size): # read the binary file using array module import array if precision == 'single': # create an empty single precision array content = array.array('f') Comment: ‘content’ is an array object and else: can only contain single-precision float as its # create an empty double precision array element content = array.array('d') Comment: ‘content’ is an array object and can only contain double-precision float as its # fill the array from file object f element content.fromfile(f, size)

Comment: fromfile() is a method of array return content object. It read number of ‘size’ objects from f.

def print_n_columns(seq, columns): # there will be number of columns '%g' seperated by tab format = '%g' + '\t%g' * (columns-1)

# get the length of seq end = len(seq)

# print seq in columns for i in range(0, end, columns): print format % tuple( seq[ i : i+columns ] ) Comment: seq[...] is a list and must be converted to a tuple for string formatting

if __name__ == '__main__': Comment: If __name__ == ‘__main__’, the import sys script is invoked from command line (not filename, precision, columns = get_arg(sys.argv) being imported by other module) main(filename, precision, columns) Comment: Multiple items are received from get_arg()

# end of file