Generators & Part 1: Iterators Edited Version of Slides by David Beazely http://www.dabeaz.comPart I Introduction to Iterators and Generators

Monday, May 16, 2011 Monday, May 16, 2011

Copyright () 2008, http://www.dabeaz.com 1- 11

Iteration Iterating over a Dict

As you know, Python has a "for" statement • • If you loop over a dictionary you get keys You use it to loop over a collection of items • >>> prices = { 'GOOG' : 490.10, >>> for x in [1,4,5,10]: ... 'AAPL' : 145.23, ... print x, ... 'YHOO' : 21.71 } ...... 1 4 5 10 >>> for key in prices: >>> ... print key ... And, as you have probably noticed, you can YHOO • GOOG iterate over many different kinds of objects AAPL (not just lists) >>>

Copyright (C) 2008, http://www.dabeaz.com 1- 12 Copyright (C) 2008, http://www.dabeaz.com 1- 13

Iterating over a String

• If you loop over a string, you get characters >>> s = "Yow!" >>> for c in s: ... print c ... Y o w ! >>>

Copyright (C) 2008, http://www.dabeaz.com 1- 14 Iterating over a Dict

• If you loop over a dictionary you get keys

>>> prices = { 'GOOG' : 490.10, ... 'AAPL' : 145.23, ... 'YHOO' : 21.71 } ... >>> for key in prices: ... print key ... YHOO GOOG AAPL >>>

Copyright (C) 2008, http://www.dabeaz.com 1- 13

Iterating over a String Iterating over a File • If you loop over a file you get lines If you loop over a string, you get characters >>> for line in open("real.txt"): • ... print line, ... >>> s = "Yow!" Iterating over a File Real Programmers write in FORTRAN >>> for c in s: ... print c Maybe they do now, ... If you loop over a file you get lines in this decadent era of • Y Lite beer, hand calculators, and "user-friendly" software o >>> for line in open("real.txt"): but back in the Good Old Days, w ... print line, when the term "software" sounded funny ! ... and Real Computers were made out of drums and vacuum tubes, >>> Real Programmers write in FORTRAN Real Programmers wrote in machine code. Not FORTRAN. Not RATFOR. Not, even, assembly language. Maybe they do now, Machine Code. in this decadent era of Raw, unadorned, inscrutable hexadecimal numbers. Lite beer, hand calculators, and "user-friendly" software Directly. but back in the Good Old Days, when the term "software" sounded funny 1- 14 Copyright (C) 2008, http://www.dabeaz.com 1- 15 Copyright (C) 2008, http://www.dabeaz.com and Real Computers were made out of drums and vacuum tubes, Real Programmers wrote in machine code. Not FORTRAN. Not RATFOR. Not, even, assembly language. Machine Code. Raw, unadorned, inscrutable hexadecimal numbers. Directly.

1- 15 Copyright (C) 2008, http://www.dabeaz.com Consuming Iterables • Many functions consume an "iterable" object • Reductions: Consuming Iterables sum(s),Iteration min(s), max(s) Protocol TheConstructors reason why you can iterate over different Many functions consume an "iterable" object •• • objectslist(s), is thattuple(s), there (s), is a specific dict(s) protocol Reductions: • >>>in operatoritems = [1, 4, 5] • >>> it = iter(items) sum(s), min(s), max(s) >>>item it.next() in s Constructors 1 • >>>Many it.next() others in the library • 4 list(s), tuple(s), set(s), dict(s) >>> it.next() in operator 5 • Copyright (C) 2008, http://www.dabeaz.com>>> it.next() 1- 16 Traceback (most recent call last): item in s File "", line 1, in StopIteration • Many others in the library >>>

Copyright (C) 2008, http://www.dabeaz.com 1- 16 Copyright (C) 2008, http://www.dabeaz.com 1- 17

Iteration Protocol • An inside look at the for statement for x in obj: # statements • Underneath the covers _iter = iter(obj) # Get iterator object while 1: try: x = _iter.next() # Get next item except StopIteration: # No more items break # statements ... • Any object that supports iter() and next() is said to be "iterable."

Copyright (C) 2008, http://www.dabeaz.com 1-18 Iteration Protocol • The reason why you can iterate over different objects is that there is a specific protocol

>>> items = [1, 4, 5] >>> it = iter(items) >>> it.next() 1 >>> it.next() 4 >>> it.next() 5 >>> it.next() Traceback (most recent call last): File "", line 1, in StopIteration >>>

Copyright (C) 2008, http://www.dabeaz.com 1- 17

Iteration Protocol Supporting Iteration • An inside look at the for statement for x in obj: • User-defined objects can support iteration # statements Example: Counting down... Underneath the covers • • Supporting Iteration >>> for x in countdown(10): _iter = iter(obj) # Get iterator object ... print x, while 1: ... User-defined try: objects can support iteration 10 9 8 7 6 5 4 3 2 1 • x = _iter.next() # Get next item >>> except StopIteration: # No more items • Example: break Counting down... To do this, you just have to make the object # statements • >>> ...for x in countdown(10): ... print x, implement __iter__() and next() Any... object that supports iter() and next() is • 10 9 8 7 6 5 4 3 2 1 said>>> to be "iterable." To do this, you just have to make the object Copyright (C) 2008,• http://www.dabeaz.com 1-18 Copyright (C) 2008, http://www.dabeaz.com 1-19 implement __iter__() and next()

Copyright (C) 2008, http://www.dabeaz.com 1-19 Supporting Iteration

• Sample implementation class countdown(object): Supporting Iteration Iteration def __init__(self,start): Example self.count = start def __iter__(self): return self def next(self): • Sample implementation if self.count <= 0: raise StopIteration class countdown(object): Example r use:= self.count def __init__(self,start): • >>> c =self.count countdown(5) -= 1 self.count = start return r def __iter__(self): >>> for i in c: return self ... print i, def next(self): ... if self.count <= 0: 5 4 3 2 1 raise StopIteration >>> r = self.count Copyright (C) 2008, http://www.dabeaz.com 1-20 self.count -= 1 return r

Copyright (C) 2008, http://www.dabeaz.com 1-20 Copyright (C) 2008, http://www.dabeaz.com 1-21

Iteration Commentary

• There are many subtle details involving the design of iterators for various objects • However, we're not going to cover that • This isn't a tutorial on "iterators" • We're talking about generators...

Copyright (C) 2008, http://www.dabeaz.com 1-22 Iteration Example

• Example use: >>> c = countdown(5) >>> for i in c: ... print i, ... 5 4 3 2 1 >>>

Copyright (C) 2008, http://www.dabeaz.com 1-21

Iteration Commentary

• There are many subtle details involving the design of iterators for various objects • However, we're not going to cover that Part 2: Generators • This isn't a tutorial on "iterators" • We're talking about generators...

Copyright (C) 2008, http://www.dabeaz.com 1-22

Monday, May 16, 2011

Generators Generator Functions • A generator is a function that produces a • The function only executes on next() sequence of results instead of a single value >>> x = countdown(10) >>> x def countdown(n): while n > 0: >>> x.next() Function starts yield n Counting down from 10 executing here n -= 1 10 >>> for i in countdown(5): >>> ... print i, yield produces a value, but suspends the function ... • 5 4 3 2 1 Function resumes on next call to next() >>> • >>> x.next() Instead of returning a value, you generate a 9 • >>> x.next() series of values (using the yield statement) 8 >>>

Copyright (C) 2008, http://www.dabeaz.com 1-23 Copyright (C) 2008, http://www.dabeaz.com 1-25

Generators Generator Functions • Behavior is quite different than normal func • Calling a generator function creates an generator object. However, it does not start • When the generator returns, iteration stops running the function. >>> x.next() 1 def countdown(n): >>> x.next() print "Counting down from", n Traceback (most recent call last): while n > 0: File "", line 1, in ? yield n StopIteration n -= 1 Notice that no >>> output was >>> x = countdown(10) produced >>> x >>>

Copyright (C) 2008, http://www.dabeaz.com 1-24 Copyright (C) 2008, http://www.dabeaz.com 1-26 Generator Functions • The function only executes on next() >>> x = countdown(10) >>> x >>> x.next() Counting down from 10 Function starts 10 executing here >>> • yield produces a value, but suspends the function • Function resumes on next call to next() >>> x.next() 9 >>> x.next() 8 >>>

Copyright (C) 2008, http://www.dabeaz.com 1-25

Generator Functions Generator Functions

When the generator returns, iteration stops • Generator Functions A generator function is mainly a more >>> x.next() • 1 convenient way of writing an iterator >>> x.next() Traceback (most recent call last): You don't have to worry about the iterator File "", line 1, in ? • StopIteration protocol (.next, .__iter__, etc.) A>>> generator function is mainly a more • It just works convenient way of writing an iterator • • You don't have to worry about the iterator protocol (.next, .__iter__, etc.)

Copyright (C) 2008, http://www.dabeaz.com 1-26 Copyright (C) 2008, http://www.dabeaz.com 1-27 • It just works

Copyright (C) 2008, http://www.dabeaz.com 1-27 Generators vs. Iterators

• A generator function is slightly different Generators vs. Iterators Generatorthan an object that supports Expressions iteration • A generator is a one-time operation. You •canA generatediterate over version the generated of a list comprehension data once, A generator function is slightly different but>>> if ayou = [1,2,3,4] want to do it again, you have to • >>> b = (2*x for x in a) than an object that supports iteration call>>> the b generator function again. A generator is a one-time operation. You This>>> isfor different i in b: printthan b,a list (which you can • • ... can iterate over the generated data once, iterate2 4 6 8over as many times as you want) but if you want to do it again, you have to >>> call the generator function again. • This loops over a sequence of items and applies Copyright (C) 2008, http://www.dabeaz.coman operation to each item 1-28 • This is different than a list (which you can iterate over as many times as you want) • However, results are produced one at a time using a generator

Copyright (C) 2008, http://www.dabeaz.com 1-28 Copyright (C) 2008, http://www.dabeaz.com 1-29

Generator Expressions • Important differences from a list comp. • Does not construct a list. • Only useful purpose is iteration • Once consumed, can't be reused • Example: >>> a = [1,2,3,4] >>> b = [2*x for x in a] >>> b [2, 4, 6, 8] >>> c = (2*x for x in a) >>>

Copyright (C) 2008, http://www.dabeaz.com 1-30 Generator Expressions • A generated version of a list comprehension >>> a = [1,2,3,4] >>> b = (2*x for x in a) >>> b >>> for i in b: print b, ... 2 4 6 8 >>> • This loops over a sequence of items and applies an operation to each item • However, results are produced one at a time using a generator

Copyright (C) 2008, http://www.dabeaz.com 1-29

Generator Expressions Generator Expressions • Important differences from a list comp. • General syntax (expression for i in s if cond1 Does not construct a list. for j in t if cond2 • ... Generator• Only useful purpose Expressions is iteration if condfinal) • General• Once consumed,syntax can't be reused • What it means Example:(expression for i in s if cond1 for i in s: • for j in t if cond2 if cond1: >>> a = [1,2,3,4] ... for j in t: >>> b = [2*x iffor condfinal x in a] ) if cond2: >>> b ... [2, 4, 6, 8] if condfinal: yield expression • What>>> c = it(2*x means for x in a) >>> if cond1: for j in t: Copyright (C) 2008, http://www.dabeaz.com if cond2: 1-30 Copyright (C) 2008, http://www.dabeaz.com 1-31 ... if condfinal: yield expression

Copyright (C) 2008, http://www.dabeaz.com 1-31 A Note on Syntax

• The parens on a generator expression can A Note on Syntax dropped if usedInterlude as a single function argument • Example: We now have two basic building blocks • sum(x*x for x in s) • The parens on a generator expression can Generator functions: dropped if used as a single function argument • def countdown(n): Generator while n expression> 0: • Example: yield n n -= 1 sum(x*x for x in s) • Generator expressions squares = (x*x for x in s) Copyright (C) 2008, http://www.dabeaz.com 1-32 Generator expression • In both cases, we get an object that generates values (which are typically consumed in a )

Copyright (C) 2008, http://www.dabeaz.com 1-32 Copyright (C) 2008, http://www.dabeaz.com 1-33

Part 2 Processing Data Files

(Show me your Web Server Logs)

Copyright (C) 2008, http://www.dabeaz.com 1- 34 Programming Problem

Find out how many bytes of data were transferred by summing up the last column of data in this Apache web server log

81.107.39.38 - ... "GET /ply/ HTTP/1.1" 200 7587 81.107.39.38 - ... "GET /favicon.ico HTTP/1.1" 404 133 81.107.39.38 - ... "GET /ply/bookplug.gif HTTP/1.1" 200 23903 81.107.39.38 - ... "GET /ply/ply.html HTTP/1.1" 200 97238 81.107.39.38 - ... "GET /ply/example.html HTTP/1.1" 200 2359 66.249.72.134 - ... "GET /index.html HTTP/1.1" 200 4447

Oh yeah, and the log file might be huge (Gbytes)

Copyright (C) 2008, http://www.dabeaz.com 1-35

Programming Problem The Log File Find out how many bytes of data were • Each line of the log looks like this: transferred by summing up the last column 81.107.39.38 - ... "GET /ply/ply.html HTTP/1.1" 200 97238 of data in this Apache web server log • The number of bytes is the last column 81.107.39.38 - ... "GET /ply/ HTTP/1.1" 200 7587 bytestr = line.rsplit(None,1)[1] 81.107.39.38 - ... "GET /favicon.ico HTTP/1.1" 404 133 81.107.39.38 - ... "GET /ply/bookplug.gif HTTP/1.1" 200 23903 It's either a number or a missing value (-) 81.107.39.38 - ... "GET /ply/ply.html HTTP/1.1" 200 97238 • 81.107.39.38 - ... "GET /ply/example.html HTTP/1.1" 200 2359 81.107.39.38 - ... "GET /ply/ HTTP/1.1" 304 - 66.249.72.134 - ... "GET /index.html HTTP/1.1" 200 4447 • Converting the value Oh yeah, and the log file might be huge (Gbytes) if bytestr != '-': bytes = int(bytestr)

Copyright (C) 2008, http://www.dabeaz.com 1-35 Copyright (C) 2008, http://www.dabeaz.com 1-36

The Log File • Each line of the log looks like this: 81.107.39.38 - ... "GET /ply/ply.html HTTP/1.1" 200 97238 •AThe Non-Generator number of bytes is the last column Soln bytestr = line.rsplit(None,1)[1] Just do a simple for-loop • It's either a number or a missing value (-) • wwwlog = open("access-log") total81.107.39.38 = 0 - ... "GET /ply/ HTTP/1.1" 304 - for line in wwwlog: Converting bytestr = line.rsplit(None,1)[1] the value • if bytestr != '-': if bytestr total != += '-': int(bytestr) Part 3: Pipelines bytes = int(bytestr) print "Total", total Copyright (C) 2008,• http://www.dabeaz.comWe read line-by-line and just update a sum 1-36 • However, that's so 90s...

Copyright (C) 2008, http://www.dabeaz.com 1-37

Monday, May 16, 2011 A Generator Solution • Let's use some generator expressions wwwlog = open("access-log") bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) bytes = (int(x) for x in bytecolumn if x != '-')

print "Total", sum(bytes) • Whoa! That's different! • Less code • A completely different programming style

Copyright (C) 2008, http://www.dabeaz.com 1-38 A Non-Generator Soln • Just do a simple for-loop wwwlog = open("access-log") total = 0 for line in wwwlog: bytestr = line.rsplit(None,1)[1] if bytestr != '-': total += int(bytestr)

print "Total", total • We read line-by-line and just update a sum • However, that's so 90s...

Copyright (C) 2008, http://www.dabeaz.com 1-37

A Generator Solution Generators as a Pipeline • Let's use some generator expressions • To understand the solution, think of it as a data wwwlog = open("access-log") processing pipeline bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) Generatorsbytes = (int(x) for asx in bytecolumna Pipeline if x != '-') access-log wwwlog bytecolumn bytes sum() total print "Total", sum(bytes) • To understand the solution, think of it as a data • Whoa!processing That's pipeline different! • Each step is defined by iteration/generation • Less code wwwlog = open("access-log") access-log wwwlog bytecolumn bytes sum() total bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) • A completely different programming style bytes = (int(x) for x in bytecolumn if x != '-') print "Total", sum(bytes) • Each step is defined by iteration/generation

Copyright (C) 2008, http://www.dabeaz.comwwwlog = open("access-log") 1-38 Copyright (C) 2008, http://www.dabeaz.com 1-39 bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) bytes = (int(x) for x in bytecolumn if x != '-')

print "Total", sum(bytes)

Copyright (C) 2008, http://www.dabeaz.com 1-39 Being Declarative • At each step of the pipeline, we declare an operation that will be applied to the entire Being Declarative input Beingstream Declarative access-log wwwlog bytecolumn bytes sum() total • At each step of the pipeline, we declare an operation that will be applied to the entire input bytecolumn• Instead = (ofline.rsplit(None,1)[1] focusing on the problem for line inat wwwlog)a line-by-line level, you just break it down access-log wwwlog bytecolumn bytes sum() total intoThis big operation operations gets that applied operate to on the wholeevery file line of the log file bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) • This is very much a "declarative" style Copyright (C) 2008,• http://www.dabeaz.comThe key : Think big... 1-40 This operation gets applied to every line of the log file

Copyright (C) 2008, http://www.dabeaz.com 1-40 Copyright (C) 2008, http://www.dabeaz.com 1-41

Iteration is the Glue • The glue that holds the pipeline together is the iteration that occurs in each step

wwwlog = open("access-log")

bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)

bytes = (int(x) for x in bytecolumn if x != '-')

print "Total", sum(bytes) • The calculation is being driven by the last step • The sum() function is consuming values being pushed through the pipeline (via .next() calls)

Copyright (C) 2008, http://www.dabeaz.com 1-42 Being Declarative

• Instead of focusing on the problem at a line-by-line level, you just break it down into big operations that operate on the whole file • This is very much a "declarative" style • The key : Think big...

Copyright (C) 2008, http://www.dabeaz.com 1-41

Iteration is the Glue Performance • The glue that holds the pipeline together is the iteration that occurs in each step • Surely, this generator approach has all wwwlog Performance= open("access-log") sorts of fancy-dancy magic that is slow. bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) Let's check it out on a 1.3Gb log file... bytes = (int(x) for x in bytecolumn if x != '-') •

printSurely, "Total", this sum(bytes) generator approach has all % ls -l big-access-log • -rw-r--r-- beazley 1303238000 Feb 29 08:06 big-access-log • Thesorts calculation of fancy-dancy is being magicdriven that by theis slow. last step • The• Let's sum() check function it out is on consuming a 1.3Gb log values file... being pushed through the pipeline (via .next() calls) % ls -l big-access-log -rw-r--r-- beazley 1303238000 Feb 29 08:06 big-access-log

Copyright (C) 2008, http://www.dabeaz.com 1-42 Copyright (C) 2008, http://www.dabeaz.com 1-43

Copyright (C) 2008, http://www.dabeaz.com 1-43 Performance Contest

wwwlog = open("big-access-log") total = 0 for line in wwwlog: Time bytestr = line.rsplit(None,1)[1] if bytestr != '-': 27.20 Performance Contest total +=Commentary int(bytestr) print "Total", total wwwlog = open("big-access-log") total = 0 for line in wwwlog: Time bytestr = line.rsplit(None,1)[1] Not only was it not slow, it was 5% faster wwwlog• = open("big-access-log") if bytestr != '-': bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) total += int(bytestr) 27.20 bytes• And = it (int(x) was less for codex in bytecolumn if x != '-') print "Total", total print• "Total",And it wassum(bytes) relatively easy to readTime And frankly, I like it a whole better...25.96 wwwlog = open("big-access-log") • bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) Copyright (C) 2008, http://www.dabeaz.com 1-44 bytes = (int(x) for x in bytecolumn if x != '-') "Back in the old days, we used AWK for this and print "Total", sum(bytes) Time we liked it. Oh, yeah, and get off my lawn!" 25.96

Copyright (C) 2008, http://www.dabeaz.com 1-44 Copyright (C) 2008, http://www.dabeaz.com 1-45

Performance Contest

wwwlog = open("access-log") bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) bytes = (int(x) for x in bytecolumn if x != '-')

print "Total", sum(bytes) Time 25.96

% '{ total += $NF } END { print total }' big-access-log

Time Note:extracting the last column may not be 37.33 awk's strong point

Copyright (C) 2008, http://www.dabeaz.com 1-46 Commentary

• Not only was it not slow, it was 5% faster • And it was less code • And it was relatively easy to read • And frankly, I like it a whole better...

"Back in the old days, we used AWK for this and we liked it. Oh, yeah, and get off my lawn!"

Copyright (C) 2008, http://www.dabeaz.com 1-45

Performance Contest Food for Thought

wwwlog = open("access-log") bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) bytes = (int(x) for x in bytecolumn if x != '-') print "Total",Food sum(bytes) for ThoughtTime • At no point in our generator solution did 25.96 we ever create large temporary lists • Thus, not only is that solution faster, it can be applied to enormous data files % awk '{At totalno point += $NF in } ourEND {generator print total solution }' big-access-log did • • It's competitive with traditional tools we ever create large temporary Timelists Note:extracting the last •columnThus, may not not only be is that solution faster,37.33 it can awk'sbe appliedstrong point to enormous data files

Copyright (C) 2008, http://www.dabeaz.com• It's competitive with traditional tools 1-46 Copyright (C) 2008, http://www.dabeaz.com 1-47

Copyright (C) 2008, http://www.dabeaz.com 1-47 More Thoughts

• The generator solution was based on the concept of pipelining data between More Thoughts Thisdifferent componentsSounds Familiar • What if you had more advanced kinds of • The generator solution was based on the components to work with? concept of pipelining data between PerhapsThe Unix you philosophy could perform different kinds different components • of processing by just plugging various Have a collection of useful system utils • What if you had more advanced kinds of • pipeline components together components to work with? • Can hook these up to files or each other Perhaps you could perform different kinds Perform complex tasks by piping data • Copyright (C) 2008, http://www.dabeaz.com• 1-48 of processing by just plugging various pipeline components together

Copyright (C) 2008, http://www.dabeaz.com 1-48 Copyright (C) 2008, http://www.dabeaz.com 1-49

Part 3 Fun with Files and Directories

Copyright (C) 2008, http://www.dabeaz.com 1- 50 Programming Problem

You have hundreds of web server logs scattered across various directories. In additional, some of the logs are compressed. Modify the last program so that you can easily read all of these logs

foo/ access-log-012007.gz access-log-022007.gz access-log-032007.gz ... access-log-012008 bar/ access-log-092007.bz2 ... access-log-022008

Copyright (C) 2008, http://www.dabeaz.com 1-51

Programming Problem os.walk() You have hundreds of web server logs scattered • A very useful function for searching the across various directories. In additional, some of file system the logs are compressed. Modify the last program import os so that you can easily read all of these logs for path, dirlist, filelist in os.walk(topdir): # path : Current directory # dirlist : List of subdirectories foo/ # filelist : List of files access-log-012007.gz ... access-log-022007.gz access-log-032007.gz ... This utilizes generators to recursively walk access-log-012008 • bar/ through the file system access-log-092007.bz2 ... access-log-022008

Copyright (C) 2008, http://www.dabeaz.com 1-51 Copyright (C) 2008, http://www.dabeaz.com 1-52

os.walk() • A very useful function for searching the file system

import os

for path, dirlist, filelistfind in os.walk(topdir): grep # path : Current directory Generate # dirlist all :filenames List of subdirectories in a directory Generate a sequence of lines that contain • # filelist : List of files • that ... match a given filename pattern a given regular expression

import os import re Thisimport utilizes fnmatch generators to recursively walk • def gen_grep(pat, lines): throughdef gen_find(filepat,top): the file system patc = re.compile(pat) for path, dirlist, filelist in os.walk(top): for line in lines: for name in fnmatch.filter(filelist,filepat): if patc.search(line): yield line yield os.path.join(path,name) • Example: Copyright (C) 2008, http://www.dabeaz.comExamples 1-52 • lognames = gen_find("access-log*", "/usr/www") pyfiles = gen_find("*.py","/") logfiles = gen_open(lognames) logs = gen_find("access-log*","/usr/www/") loglines = gen_cat(logfiles) patlines = gen_grep(pat, loglines)

Copyright (C) 2008, http://www.dabeaz.com 1-53 Copyright (C) 2008, http://www.dabeaz.com 1-57

Performance Contest Example

pyfiles = gen_find("*.py","/") Wall Clock Time Find out how many bytes transferred for a for name in pyfiles: • print name specific pattern in a whole directory of logs 559s pat = r"somepattern" logdir = "/some/dir/"

filenames = gen_find("access-log*",logdir) % find / -name '*.py' Wall Clock Time logfiles = gen_open(filenames) loglines = gen_cat(logfiles) patlines = gen_grep(pat,loglines) 468s bytecolumn = (line.rsplit(None,1)[1] for line in patlines) bytes = (int(x) for x in bytecolumn if x != '-') Performed on a 750GB file system print "Total", sum(bytes) containing about 140000 .py files

Copyright (C) 2008, http://www.dabeaz.com 1-54 Copyright (C) 2008, http://www.dabeaz.com 1-58 grep • Generate a sequence of lines that contain a given regular expression

import re

def gen_grep(pat, lines): patc = re.compile(pat) for line in lines: if patc.search(line): yield line • Example: lognames = gen_find("access-log*", "/usr/www") logfiles = gen_open(lognames) loglines = gen_cat(logfiles) patlines = gen_grep(pat, loglines)

Copyright (C) 2008, http://www.dabeaz.com 1-57

Example Important Concept • Find out how many bytes transferred for a • Generators decouple iteration from the specific pattern in a whole directory of logs code that uses the results of the iteration pat = r"somepattern" In the last example, we're performing a logdir = "/some/dir/" • calculation on a sequence of lines filenames = gen_find("access-log*",logdir) logfiles = gen_open(filenames) It doesn't matter where or how those loglines = gen_cat(logfiles) • patlines = gen_grep(pat,loglines) lines are generated bytecolumn = (line.rsplit(None,1)[1] for line in patlines) bytes = (int(x) for x in bytecolumn if x != '-') • Thus, we can plug any number of print "Total", sum(bytes) components together up front as long as they eventually produce a line sequence

Copyright (C) 2008, http://www.dabeaz.com 1-58 Copyright (C) 2008, http://www.dabeaz.com 1-59

Yield as Partan Expr 4 ession • In PythonParsing 2.5, a slight and Processing modification Data to the yield statement was introduced (PEP-342) • You could now use yield as an expression Part 4: Coroutines • For example, on the right side of an assignment def grep(pattern): grep.py print "Looking for %s" % pattern

Copyright (C) 2008, http://www.dabeaz.com while True: 1- 60 line = (yield) if pattern in line: print line, • Question : What is its value?

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 23

Monday, May 16, 2011 Coroutines • If you use yield more generally, you get a • These do more than just generate values • Instead, functions can consume values sent to it. >>> g = grep("python") >>> g.next() # Prime it (explained shortly) Looking for python >>> g.send("Yeah, but no, but yeah, but no") >>> g.send("A series of tubes") >>> g.send("python generators rock!") python generators rock! >>> • Sent values are returned by (yield)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 24 Yield as an Expression • In Python 2.5, a slight modification to the yield statement was introduced (PEP-342) • You could now use yield as an expression • For example, on the right side of an assignment def grep(pattern): grep.py print "Looking for %s" % pattern while True: line = (yield) if pattern in line: print line, • Question : What is its value?

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 23

Coroutines Coroutine Execution

• If you use yield more generally, you get a coroutine • Execution is the same as for a generator • TheseCor dooutine more than just Ex generateecution values • When you call a coroutine, nothing happens • Instead, functions can consume values sent to it. • They only run in response to next() and send() >>> g = grep("python") methods Ex>>>ecution g.next() is the same as# Primefor a itgenerator (explained shortly) Notice that no • Looking for python output was produced When>>> g.send("Yeah, you call a butcor no,outine but yeah,, nothing but no") happens >>> g = grep("python") • >>> g.send("A series of tubes") >>> g.next() >>> g.send("python generators rock!") Looking for python On first operation, • Thepythony onl generatorsy run in rock! response to next() and send() >>> coroutine starts methods>>> running Notice that no Sent values are returned by output(yield) was • produced >>> g = grep("python")

Copyright (C) 2009, David Beazle>>>y, http://www g.next().dabeaz.com 24 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 25 Looking for python On first operation, >>> coroutine starts running

25 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Coroutine Priming • All coroutines must be "primed" by first calling .next() (or send(None)) Coroutine Priming • ThisUsing advances exaecution Decorator to the location of the first yield expression. All coroutines must be "primed" by first • Rememberingdef grep(pattern): to call .next() is easy to forget calling .next() (or send(None)) • print "Looking for %s" % pattern Solv whileed b yTrue: wrapping coroutines with.next() advancesa decorator the This advances execution to the location of the • line = (yield) coroutine to the • def coroutine(func): if pattern in line: first yield exprcoroutine.pyession first yield expression. def start(*args,**kwargs): print line, cr = func(*args,**kwargs) def grep(pattern): At this point, cr.next() it's ready to receive a value print "Looking for %s" % pattern • return cr return start while True: .next() advances the line = (yield) coroutine to the Copyright (C) 2009, David Beazle@coroutiney, http://www.dabeaz.com 26 if pattern in line: first yield expression print line, def grep(pattern): ... • At this point, it's ready to receive a value • I will use this in most of the future examples

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 26 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 27

Closing a Coroutine • A coroutine might run indefinitely • Use .close() to shut it down >>> g = grep("python") >>> g.next() # Prime it Looking for python >>> g.send("Yeah, but no, but yeah, but no") >>> g.send("A series of tubes") >>> g.send("python generators rock!") python generators rock! >>> g.close() • Note: Garbage collection also calls close()

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 28 Using a Decorator • Remembering to call .next() is easy to forget • Solved by wrapping coroutines with a decorator def coroutine(func): coroutine.py def start(*args,**kwargs): cr = func(*args,**kwargs) cr.next() return cr return start

@coroutine def grep(pattern): ... • I will use this in most of the future examples

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 27

Closing a Coroutine Catching close() • A coroutine might run indefinitely • close() can be caught (GeneratorExit) @coroutine grepclose.py • Use .close() to shut it down def grep(pattern): print "Looking for %s" % pattern >>> g Catching= grep("python") close() try: >>> g.next() # Prime it while True: close()Looking forcan python be caught (GeneratorExit) line = (yield) • >>> g.send("Yeah, but no, but yeah, but no") if pattern in line: >>> g.send("A series of tubes") print line, @coroutine >>> g.send("python generators rock!") grepclose.py except GeneratorExit: def grep(pattern): python generators rock! print "Going away. Goodbye" print "Looking for %s" % pattern >>> g.close() try: while True: You cannot ignore this exception Note: Garbage line =collection (yield) also calls close() • • if pattern in line: print line, Only legal action is to clean up and return except GeneratorExit: • print "Going away. Goodbye" Copyright (C) 2009, David Beazley, http://www.dabeaz.com 28 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 29 • You cannot ignore this exception • Only legal action is to clean up and return

29 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Throwing an Exception • Exceptions can be thrown inside a coroutine >>> g = grep("python") >>> g.next() # Prime it Looking for python Throwing an Exception >>> g.send("pythonInterlude generators rock!") python generators rock! Exceptions can be thrown inside a coroutine >>> g.throw(RuntimeError,"You're hosed") • DespiteTraceback some (most similarities, recent call last):Generators and • File "", line 1, in >>> g = grep("python") cor outinesFile "", are basicall line 4,y intw grepo different concepts >>> g.next() # Prime it RuntimeError: You're hosed Looking for python >>> >>> g.send("python generators rock!") • Generators produce values python generators rock! Exception originates at the yield expression >>> g.throw(RuntimeError,"You're hosed") •Coroutines tend to consume values Traceback (most recent call last): • File "", line 1, in • Can be caught/handled in the usual ways File "", line 4, in grep • It is easy to get sidetracked because methods RuntimeError: You're hosed meant for coroutines are sometimes described as >>> Copyright (C) 2009, David Beazley, http://www.dabeaz.com 30 a way to tweak generators that are in the process • Exception originates at the yield expression of producing an iteration pattern (i.e., resetting its • Can be caught/handled in the usual ways value). This is mostly bogus.

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 30 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 31

A Bogus Example • A "generator" that produces and receives values def countdown(n): bogus.py print "Counting down from", n while n >= 0: newvalue = (yield n) # If a new value got sent in, reset n with it if newvalue is not None: n = newvalue else: n -= 1 • It runs, but it's "flaky" and hard to understand c = countdown(5) 5 for n in c: output Notice how a value 2 print n got "lost" in the 1 if n == 5: iteration protocol 0 c.send(3)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 32 Interlude • Despite some similarities, Generators and coroutines are basically two different concepts • Generators produce values • Coroutines tend to consume values • It is easy to get sidetracked because methods meant for coroutines are sometimes described as a way to tweak generators that are in the process of producing an iteration pattern (i.e., resetting its value). This is mostly bogus.

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 31

A Bogus Example Keeping it Straight • A "generator" that produces and receives values def countdown(n): bogus.py print "Counting down from", n • Generators produce data for iteration while n >= 0: newvalue = (yield n) Coroutines are consumers of data # If a new value got sent in, reset n with it • if newvalue is not None: n = newvalue To keep your brain from exploding, you don't mix else: • n -= 1 the two concepts together • It runs, but it's "flaky" and hard to understand • Coroutines are not related to iteration c = countdown(5) 5 for n in c: output Notice how a value Note : There is a use of having yield produce a 2 • print n got "lost" in the 1 if n == 5: iteration protocol value in a coroutine, but it's not tied to iteration. 0 c.send(3)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 32 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 33

ProcessingPart Pipelines2 Coroutines, Pipelines, and Dataflow • Coroutines can be used to set up pipes send() send() send() Part 5: General Pipelines coroutine coroutine coroutine • You just chain coroutines together and push 34 Copyright (C) 2009, David dataBeazley, http://www thr.dabeaz.comough the pipe with send() operations

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 35

Monday, May 16, 2011 Pipeline Sources • The pipeline needs an initial source (a producer) send() send() source coroutine • The source drives the entire pipeline def source(target): while not done: item = produce_an_item() ... target.send(item) ... target.close() • It is typically not a coroutine Copyright (C) 2009, David Beazley, http://www.dabeaz.com 36 Pipeline Sinks • The pipeline must have an end-point (sink) send() send() coroutine sink

• Collects all data sent to it and processes it @coroutine def sink(): try: while True: item = (yield) # Receive an item ... except GeneratorExit: # Handle .close() # Done ...

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 37

Pipeline Sinks An Example A source that mimics Unix 'tail -f' The pipeline must have an end-point (sink) • • import time cofollow.py def follow(thefile, target): send() send() thefile.seek(0,2) # Go to the end of the file coroutine sink while AnTrue: Example line = thefile.readline() if not line: Collects all data sent to it and processes it time.sleep(0.1) # Sleep briefly • Hooking it togethercontinue • target.send(line) @coroutine f = open("access-log") def sink(): follow(f, printer()) try: • A sink that just prints the lines while True: A@coroutine picture item = (yield) # Receive an item • def printer(): ... while True: except GeneratorExit: # Handle .close() send() linefollow() = (yield) printer() # Done print line, ...

37 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 38 Copyright (C) 2009, David Beazley, http://www.dabeaz.com • Critical point : follow() is driving the entire computation by reading lines and pushing them into the printer() coroutine

39 An Example Copyright (C) 2009, David Beazley, http://www.dabeaz.com • A source that mimics Unix 'tail -f' import time cofollow.py def follow(thefile, target): thefile.seek(0,2) # Go to the end of the file while True: Pipeline Filters lineAn = thefile.readline() Example if not line: time.sleep(0.1) # Sleep briefly Intermediate stages both receive and send Hooking it togethercontinue • • target.send(line) f = open("access-log") send() send() • follow(f,A sink that printer()) just prints the lines coroutine A@coroutine picture • def printer(): • Typically perform some kind of data while True: send() transformation, filtering, routing, etc. linefollow() = (yield) printer() print line, @coroutine def filter(target):

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 38 while True: • Critical point : follow() is driving the entire item = (yield) # Receive an item # Transform/filter item computation by reading lines and pushing them ... into the printer() coroutine # Send it along to the next stage target.send(item)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 39 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 40

Pipeline Filters • Intermediate stages both receive and send send() send() coroutine • Typically perform some kind of data transformation, filtering, routing, etc.

@coroutine def filter(target): while True: item = (yield) # Receive an item # Transform/filter item ... # Send it along to the next stage target.send(item)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 40 A Filter Example • A grep filter coroutine copipe.py @coroutine def grep(pattern,target): while True: line = (yield) # Receive a line if pattern in line: target.send(line) # Send to next stage • Hooking it up f = open("access-log") follow(f, grep('python', printer())) • A picture send() send() follow() grep() printer()

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 41

A Filter Example Interlude • A grep filter coroutine copipe.py @coroutine • Coroutines flip generators around def grep(pattern,target): while True: generators/iteration line = (yield) # Receive a line input if pattern in line: generatorBeinggenerator Branchgeneratory for x in s: sequence target.send(line) # Send to next stage Hooking it up • With coroutines, you can send data to multiple • coroutinesdestinations f = open("access-log") send() follow(f, send() send() source coroutine coroutinecoroutine coroutine grep('python', send() printer())) send() send() • A picture source• Key differencecoroutine. Generatorscoroutine pull data through send() send() the pipe with iteration. Coroutines push data follow() grep() printer() into the pipeline withsend() send().coroutine

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 41 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 42 • The source simply "sends" data. Further routing of that data can be arbitrarily complex

43 Interlude Copyright (C) 2009, David Beazley, http://www.dabeaz.com • Coroutines flip generators around generators/iteration input generatorBeinggenerator Branchgeneratory for x in s: Example : Broadcasting sequence • With coroutines, you can send data to multiple coroutinesdestinations Broadcast to multiple targets send() send() send() • source coroutine coroutinecoroutine coroutine @coroutine cobroadcast.py send() def broadcast(targets): while True: send() send() item = (yield) source• Key differencecoroutine. Generatorscoroutine pull data through for target in targets: the pipe with iteration. Coroutines push data target.send(item) into the pipeline withsend() send().coroutine This takes a sequence of coroutines (targets) Copyright (C) 2009, David Beazley, http://www.dabeaz.com 42 • and sends received items to all of them. • The source simply "sends" data. Further routing of that data can be arbitrarily complex

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 43 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 44

Example : Broadcasting

• Broadcast to multiple targets @coroutine cobroadcast.py def broadcast(targets): while True: item = (yield) for target in targets: target.send(item) • This takes a sequence of coroutines (targets) and sends received items to all of them.

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 44 Example : Broadcasting • Example use: f = open("access-log") follow(f, broadcast([grep('python',printer()), grep('ply',printer()), grep('swig',printer())]) )

grep('python') printer()

follow broadcast grep('ply') printer()

grep('swig') printer()

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 45

Example : Broadcasting Example : Broadcasting • Example use: • A more disturbing variation... f = open("access-log") cobroadcast2.py f = open("access-log") p = printer() follow(f, follow(f, broadcast([grep('python',printer()), broadcast([grep('python',p), grep('ply',printer()), grep('ply',p), grep('swig',printer())]) grep('swig',p)]) ) )

grep('python') printer() grep('python')

follow broadcast grep('ply') printer() follow broadcast grep('ply') printer()

grep('swig') printer() grep('swig')

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 45 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 46

Example : Broadcasting • A more disturbing variation... f = open("access-log") cobroadcast2.py p = printer() follow(f, broadcast([grep('python',p),Interlude grep('ply',p), grep('swig',p)]) ) • Coroutines provide morgrep('python')e powerful data routing possibilities than simple iterators follow broadcast grep('ply') printer() • If you built a collection of simple data processing components, you cangrep('swig') glue them together into Part 6: Tasks complex arrangements of pipes, branches, merging, etc. Copyright (C) 2009, David Beazley, http://www.dabeaz.com 46 • Although there are some limitations (later)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 47

Monday, May 16, 2011 A Digression • In preparing this tutorial, I found myself wishing that variable assignment was an expression

@coroutine @coroutine def printer(): def printer(): while True: while (line = yield): line = (yield) vs. print line, print line,

• However, I'm not holding my breath on that... • Actually, I'm expecting to be flogged with a rubber chicken for even suggesting it.

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 48 The Task Concept • In concurrent programming, one typically subdivides problems into "tasks" • Tasks have a few essential features • Independent control flow • Internal state • Can be scheduled (suspended/resumed) • Can communicate with other tasks • Claim : Coroutines are tasks

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 93

The Task Concept Are Coroutines Tasks?

In concurrent programming, one typically • Let's look at the essential parts subdivides problems into "tasks" • Coroutines have their own control flow. Tasks have a few essential features • Are Coroutines Tasks? • @coroutine def grep(pattern): statements • Independent control flow print "Looking for %s" % pattern while True: Internal state Cor outines line ha v= e(yield) their internal own state • • if pattern in line: Can be scheduled (suspended/resumed) • For example :print local line, variables • @coroutine Can communicate with other tasks • A cordefoutine grep(pattern is just): a sequence of statements like • print "Looking for %s" % pattern any other while Python True: function Claim : Coroutines are tasks locals line = (yield) • if pattern in line: print line, Copyright (C) 2009, David Beazley, http://www.dabeaz.com 93 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 94 • The locals live as long as the coroutine is active • They establish an execution environment 95 Are Coroutines Tasks? Copyright (C) 2009, David Beazley, http://www.dabeaz.com

• Let's look at the essential parts • CorAroutinese Cor have theiroutines own contr Tol asks?flow. Are Coroutines Tasks? @coroutine def grep(pattern): statements print "Looking for %s" % pattern while True: Cor outines line ha v= e(yield) their internal own state Coroutines can communicate • if pattern in line: • print line, • For example : local variables The .send() method sends data to a coroutine @coroutine • • A cordefoutine grep(pattern is just): a sequence of statements like @coroutine any other print Python "Looking function for %s" % pattern def grep(pattern): while True: print "Looking for %s" % pattern locals line = (yield) while True: if pattern in line: line = (yield) send(msg) print line, 94 if pattern in line: Copyright (C) 2009, David Beazley, http://www.dabeaz.com print line, • The locals live as long as the coroutine is active • yield expressions receive input • They establish an execution environment

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 95 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 96

Are Coroutines Tasks?

• Coroutines can communicate • The .send() method sends data to a coroutine @coroutine def grep(pattern): print "Looking for %s" % pattern while True: line = (yield) send(msg) if pattern in line: print line, • yield expressions receive input

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 96 Are Coroutines Tasks?

• Coroutines can be suspended and resumed • yield suspends execution • send() resumes execution • close() terminates execution

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 97

Are Coroutines Tasks? I'm Convinced

• Very clearly, coroutines look like tasks • Coroutines can be suspended and resumed • But they're not tied to threads • yield suspends execution • Or subprocesses • send() resumes execution • A question : Can you perform multitasking • close() terminates execution without using either of those concepts? Part 6 • Multitasking using nothing but coroutines? A Crash Course in Operating Systems

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 97 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 98

99 Copyright (C) 2009, David Beazley, http://wwwI'm.dabeaz.com Convinced

• Very clearly, coroutines look like tasks • ButPr theogramy're not tied Exto threcutioneads The Multitasking Problem • •OnOr a CPUsubpr, aocesses program is a series of instructions A question : Can you _main:perform multitasking int main()• { pushl %ebp int i,without total = 0; using eithercc of those movl concepts? %esp, %ebp for (i = 0; i < 10; i++) subl $24, %esp • CPUs don't know anything about multitasking { Multitasking using nothing but movl cor outines?$0, -12(%ebp) total• += i; movl $0, -16(%ebp) Nor do application programs } jmp L2 • } L3: movl -16(%ebp), %eax • Well, surely something has to know about it! When running, there leal -12(%ebp), %edx • addl %eax, (%edx) Copyright (C) 2009, David Beazley, http://www.dabeaz.com 98 Hint: It's the operating system is no notion of doing leal -16(%ebp), %eax • incl (%eax) more than one thing L2: cmpl $9, -16(%ebp) at a time (or any kind jle L3 of task switching) leave ret

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 100 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 101

Operating Systems

• As you hopefully know, the operating system (e.g., Linux, Windows) is responsible for running programs on your machine • And as you have observed, the operating system does allow more than one process to execute at once (e.g., multitasking) • It does this by rapidly switching between tasks • Question : How does it do that?

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 102 The Multitasking Problem

• CPUs don't know anything about multitasking • Nor do application programs • Well, surely something has to know about it! • Hint: It's the operating system

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 101

Operating Systems A Conundrum

• As you hopefully know, the operating system (e.g., Linux, Windows) is responsible for • When a CPU is running your program, it is not running Aprograms Con on undrumyour machine running the operating system • And as you have observed, the operating • Question: How does the operating system system does allow more than one process to (which is not running) make an application • Whenexecute a CPUat once is running (e.g., m ultitasking)your program, it is not (which is running) switch to another task? running the operating system • It does this by rapidly switching between tasks • The "context-switching" problem... Question: How does the operating system • Question : How does it do that? • (which is not running) make an application (which is running) switch to another task?

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 102 103 • The "context-switching" problem... Copyright (C) 2009, David Beazley, http://www.dabeaz.com

103 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Interrupts and Traps • There are usually only two mechanisms that an operating system uses to gain control Interrupts and Traps Tra• Interpsrupts and - Some System kind of hardwar Callse related signal (data received, timer, keypress, etc.) There are usually only two mechanisms that an • Low-level system calls are actually traps • • Traps - A software generated signal operating system uses to gain control • It is a special CPU instruction • In both cases, the CPU briefly suspends what it is • Interrupts - Some kind of hardware related read(fd,buf,nbytes) read: doing, and runs code that's par pusht of the%ebx OS signal (data received, timer, keypress, etc.) mov 0x10(%esp),%edx It is at this time the OS might smovwitch 0xc(%esp),%ecx tasks Traps - A software generated signal • When a trap instruction mov 0x8(%esp),%ebx • • mov $0x3,%eax executes, the program int $0x80 trap In both cases, the CPU briefly suspends what it is Copyright (C) 2009, David Beazley, http://www.dabeaz.com 104 • suspends execution at pop %ebx doing, and runs code that's part of the OS ... that point It is at this time the OS might switch tasks • • And the OS takes over

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 104 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 105

High Level Overview • Traps are what make an OS work • The OS drops your program on the CPU • It runs until it hits a trap (system call) • The program suspends and the OS runs • Repeat trap trap trap

run run run run

OS executes

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 106 Traps and System Calls • Low-level system calls are actually traps • It is a special CPU instruction read(fd,buf,nbytes) read: push %ebx mov 0x10(%esp),%edx mov 0xc(%esp),%ecx When a trap instruction mov 0x8(%esp),%ebx • mov $0x3,%eax executes, the program int $0x80 trap pop %ebx suspends execution at ... that point • And the OS takes over

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 105

High Level Overview Task Switching Here's what typically happens when an Traps are what make an OS work • • OS runs multiple tasks. • The OS drops your program on the CPU trap trap trap Task Switching Task A: • It runs until it hits a trap (system call) run run run Here's what typically happens when an • The program suspends and the OS runs • OS runs multiple tasks. task switch trap trap Repeat • trap trap trap Task B: Task A: run run run trap runtrap trap run On each trap, the system switches to a task srunwitch run run run • trap trap different task (cycling between them) OS executes Task B: Copyright (C) 2009, David Beazley, http://www.dabeaz.comrun run 106 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 107 • On each trap, the system switches to a different task (cycling between them)

107 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Task Scheduling • To run many tasks, add a bunch of queues Ready Queue Running Task Scheduling task taskAn taskInsighttask task

To run many tasks, add a bunch of queues • The yield statement is a kind of CPU"trap" CPU • Wait Queues Ready Queue Running No really! Traps • task task task task task task task • When a generator function hits a "yield" statement, ittask immediately suspends execution CPU CPU Wait Queues Control is passedtask taskback totask whatever code Traps • task task made the generator function run (unseen) Copyright (C) 2009, David Beazley, http://www.dabeaz.com 108 task • If you treat yield as a trap, you can build a multitasking "operating system"--all in Python! task task task

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 108 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 109

Part 7 Let's Build an Operating System (You may want to put on your 5-point safety harness)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 110 Our Challenge

• Build a multitasking "operating system" Our Challenge • Use nothing but pure Python code Part 7: A Mini OS • No threads • Build a multitasking "operating system" • No subprocesses • Use nothing but pure Python code • Use generators/coroutines • No threads • No subprocesses 111 • Use generators/coroutines Copyright (C) 2009, David Beazley, http://www.dabeaz.com

Monday, May 16, 2011 111 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Some Motivation • There has been a lot of recent interest in alternatives to threads (especially due to the GIL) Some Motivation • Non-blockingStep 1: and Define asynchronous T I/Oasks Example: servers capable of supporting There has been a lot of recent interest in • • • thousandsA task object of simultaneous client connections alternatives to threads (especially due to the GIL) class Task(object): pyos1.py A lot taskid of w ork= 0 has focused on event-driven Non-blocking and asynchronous I/O • def __init__(self,target): • systems Task.taskid or the "Reactor += 1 Model" (e.g., Twisted) self.tid = Task.taskid # Task ID • Example: servers capable of supporting self.target = target # Target coroutine • Cor outines self.sendval are a whole = None diff er ent # twist...Value to send thousands of simultaneous client connections def run(self): return self.target.send(self.sendval) 112 • A lot of work has focused on event-driven Copyright (C) 2009, David Beazley, http://www.dabeaz.com systems or the "Reactor Model" (e.g., Twisted) • A task is a wrapper around a coroutine • Coroutines are a whole different twist... • There is only one operation : run()

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 112 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 113

Task Example • Here is how this wrapper behaves # A very simple generator def foo(): print "Part 1" yield print "Part 2" yield

>>> t1 = Task(foo()) # Wrap in a Task >>> t1.run() Part 1 >>> t1.run() Part 2 >>> • run() executes the task to the next yield (a trap)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 114 Step 1: Define Tasks

• A task object class Task(object): pyos1.py taskid = 0 def __init__(self,target): Task.taskid += 1 self.tid = Task.taskid # Task ID self.target = target # Target coroutine self.sendval = None # Value to send def run(self): return self.target.send(self.sendval) • A task is a wrapper around a coroutine • There is only one operation : run()

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 113

Task Example Step 2: The Scheduler

Here is how this wrapper behaves class Scheduler(object): • def __init__(self): pyos2.py # A very simple generator self.ready = Queue() def foo(): self.taskmap = {} print "Part 1" yield def new(self,target): Step print "Part2: The2" Scheduler newtask = Task(target) yield self.taskmap[newtask.tid] = newtask class Scheduler(object): self.schedule(newtask) def __init__(self): pyos2.py >>> t1 = Task(foo()) # Wrap in a Task return newtask.tid >>> self.ready t1.run() = Queue() self.taskmap = {} Part 1 def schedule(self,task): >>> t1.run() self.ready.put(task) defPart new(self,target): 2 newtask = Task(target) >>> def mainloop(self): self.taskmap[newtask.tid] = newtask while self.taskmap: self.schedule(newtask) run() executes the task to the next yield (a trap) task = self.ready.get() • return newtask.tid result = task.run() self.schedule(task) def schedule(self,task): self.ready.put(task) Copyright (C) 2009, David Beazley, http://www.dabeaz.com 114 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 115 def mainloop(self): while self.taskmap: task = self.ready.get() result = task.run() self.schedule(task)

115 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Step 2: The Scheduler

class Scheduler(object): def __init__(self): self.ready = Queue() A queue of tasks that self.taskmap = {} are ready to run Step 2: The Scheduler def new(self,target): Step newtask = 2:Task(target) The Scheduler self.taskmap[newtask.tid] = newtask class Scheduler(object): class Scheduler(object): self.schedule(newtask) def __init__(self): A queue of tasks that def return__init__(self): newtask.tid self.ready = Queue() self.ready = Queue() self.taskmap = {} are ready to run def schedule(self,task):self.taskmap = {} self.ready.put(task) Introduces a new task def new(self,target): def new(self,target): to the scheduler newtask = Task(target) def mainloop(self):newtask = Task(target) self.taskmap[newtask.tid] = newtask whileself.taskmap[newtask.tid] self.taskmap: = newtask self.schedule(newtask) self.schedule(newtask) task = self.ready.get() return newtask.tid return result newtask.tid = task.run() self.schedule(task) def schedule(self,task): def schedule(self,task): self.ready.put(task) self.ready.put(task) Copyright (C) 2009, David Beazley, http://www.dabeaz.com 116 def mainloop(self): def mainloop(self): while self.taskmap: while self.taskmap: task = self.ready.get() task = self.ready.get() result = task.run() result = task.run() self.schedule(task) self.schedule(task)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 116 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 117

Step 2: The Scheduler

class Scheduler(object): def __init__(self): self.ready = Queue() self.taskmap = {} A dictionary that def new(self,target): keeps track of all newtask = Task(target) self.taskmap[newtask.tid] = newtask active tasks (each self.schedule(newtask) task has a unique return newtask.tid integer task ID)

def schedule(self,task): self.ready.put(task) (more later)

def mainloop(self): while self.taskmap: task = self.ready.get() result = task.run() self.schedule(task)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 118 Step 2: The Scheduler

class Scheduler(object): def __init__(self): self.ready = Queue() self.taskmap = {} Introduces a new task def new(self,target): to the scheduler newtask = Task(target) self.taskmap[newtask.tid] = newtask self.schedule(newtask) return newtask.tid

def schedule(self,task): self.ready.put(task)

def mainloop(self): while self.taskmap: task = self.ready.get() result = task.run() self.schedule(task)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 117

Step 2: The Scheduler Step 2: The Scheduler

class Scheduler(object): class Scheduler(object): def __init__(self): def __init__(self): self.ready = Queue() self.ready = Queue() self.taskmap = {} self.taskmap = {} A dictionary that defStep new(self,target): 2: The Schedulerkeeps track of all def new(self,target): newtask = Task(target) newtask = Task(target) self.taskmap[newtask.tid] = newtask active tasks (each self.taskmap[newtask.tid] = newtask class Scheduler(object): self.schedule(newtask) task has a unique self.schedule(newtask) def __init__(self): return newtask.tid return newtask.tid self.ready = Queue() integer task ID) Put a task onto the self.taskmap = {} def schedule(self,task): def schedule(self,task): ready queue. This self.ready.put(task) (more later) self.ready.put(task) def new(self,target): makes it available newtask = Task(target) def mainloop(self): def mainloop(self): to run. self.taskmap[newtask.tid] = newtask while self.taskmap: while self.taskmap: self.schedule(newtask) task = self.ready.get() task = self.ready.get() return newtask.tid result = task.run() Put a task onto the result = task.run() self.schedule(task) self.schedule(task) def schedule(self,task): ready queue. This self.ready.put(task) Copyright (C) 2009, David Beazley, http://www.dabeaz.com makes it available 118 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 119 def mainloop(self): to run. while self.taskmap: task = self.ready.get() result = task.run() self.schedule(task)

119 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Step 2: The Scheduler

class Scheduler(object): def __init__(self): self.ready = Queue() self.taskmap = {} Step 2: The Scheduler def new(self,target): newtaskFirst = Task(target) Multitasking self.taskmap[newtask.tid] = newtask class Scheduler(object): T w self.schedule(newtask)o tasks: def __init__(self): • return newtask.tid self.ready = Queue() def foo(): self.taskmap = {} while True: def schedule(self,task): print "I'm foo" self.ready.put(task) def new(self,target): yield The main scheduler newtask = Task(target) def mainloop(self): loop. It pulls tasks off the self.taskmap[newtask.tid] = newtask def bar(): while while self.taskmap: True: queue and runs them to self.schedule(newtask) task = self.ready.get() return newtask.tid print "I'm bar" the next yield. result yield = task.run() self.schedule(task) def schedule(self,task): self.ready.put(task) • Running them into the scheduler The main scheduler Copyright (C) 2009, David Beazley, http://www.dabeaz.com 120 def mainloop(self): loop. It pulls tasks off the sched = Scheduler() while self.taskmap: queue and runs them to sched.new(foo()) task = self.ready.get() the next yield. sched.new(bar()) result = task.run() sched.mainloop() self.schedule(task)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 120 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 121

First Multitasking • Example output: I'm foo I'm bar I'm foo I'm bar I'm foo I'm bar • Emphasize: yield is a trap • Each task runs until it hits the yield • At this point, the scheduler regains control and switches to the other task

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 122 First Multitasking • Two tasks: def foo(): while True: print "I'm foo" yield

def bar(): while True: print "I'm bar" yield • Running them into the scheduler sched = Scheduler() sched.new(foo()) sched.new(bar()) sched.mainloop()

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 121

First Multitasking Problem : Task Termination

Example output: • The scheduler crashes if a task returns • def foo(): taskcrash.py I'm foo for i in xrange(10): I'm bar print "I'm foo" I'm foo yield ProblemI'm bar : Task Termination ... I'm foo I'm foo TheI'm barscheduler crashes if a task returns I'm bar • I'm foo def foo(): taskcrash.py I'm bar Emphasize: for i in xrange(10):yield is a trap • Traceback (most recent call last): print "I'm foo" File "crash.py", line 20, in yield Each task runs until it hits the yield sched.mainloop() • ... File "scheduler.py", line 26, in mainloop I'm foo result = task.run() AtI'm this bar point, the scheduler regains control • File "task.py", line 13, in run I'm foo and switches to the other task return self.target.send(self.sendval) I'm bar StopIteration Traceback (most recent call last): File "crash.py", line 20, in Copyright (C) 2009, David Beazley, http://www.dabeaz.com 122 123 sched.mainloop() Copyright (C) 2009, David Beazley, http://www.dabeaz.com File "scheduler.py", line 26, in mainloop result = task.run() File "task.py", line 13, in run return self.target.send(self.sendval) StopIteration

123 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Step 3: Task Exit

class Scheduler(object): pyos3.py ... def exit(self,task): print "Task %d terminated" % task.tid Step 3: Task Exit del Stepself.taskmap[task.tid] 3: Task Exit ... def mainloop(self): while self.taskmap: Remove the task class Scheduler(object): pyos3.py class Scheduler(object): task = self.ready.get() from the scheduler's ...... try: def exit(self,task): task map def exit(self,task): result = task.run() print "Task %d terminated" % task.tid print except "Task StopIteration: %d terminated" % task.tid del self.taskmap[task.tid] del self.taskmap[task.tid] self.exit(task) ...... continue def mainloop(self): def mainloop(self): self.schedule(task) while self.taskmap: while self.taskmap: task = self.ready.get() task = self.ready.get() try: try: result = task.run() result = task.run() except StopIteration: except StopIteration: Copyright (C) 2009, David Beazley, http://www.dabeaz.com 124 self.exit(task) self.exit(task) continue continue self.schedule(task) self.schedule(task)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 124 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 125

Step 3: Task Exit

class Scheduler(object): ... def exit(self,task): print "Task %d terminated" % task.tid del self.taskmap[task.tid] ... def mainloop(self): while self.taskmap: task = self.ready.get() try: result = task.run() Catch task exit and except StopIteration: self.exit(task) cleanup continue self.schedule(task)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 126 Step 3: Task Exit Remove the task class Scheduler(object): from the scheduler's ... def exit(self,task): task map print "Task %d terminated" % task.tid del self.taskmap[task.tid] ... def mainloop(self): while self.taskmap: task = self.ready.get() try: result = task.run() except StopIteration: self.exit(task) continue self.schedule(task)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 125

Step 3: Task Exit Second Multitasking

class Scheduler(object): • Two tasks: ... def foo(): def exit(self,task): for i in xrange(10): print "Task %d terminated" % task.tid print "I'm foo" del self.taskmap[task.tid] Second Multitasking yield ... def mainloop(self): def bar(): while self.taskmap: Two tasks: for i in xrange(5): • task = self.ready.get() print "I'm bar" def foo():try: yield for iresult in xrange(10): = task.run() Catch task exit and except print StopIteration: "I'm foo" sched = Scheduler() yieldself.exit(task) cleanup sched.new(foo()) continue sched.new(bar()) def bar():self.schedule(task) sched.mainloop() for i in xrange(5): print "I'm bar" yield

Copyright (C) 2009, David Beazleschedy, http://www =.dabeaz.com Scheduler() 126 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 127 sched.new(foo()) sched.new(bar()) sched.mainloop()

127 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Second Multitasking • Sample output I'm foo I'm bar I'm foo Second Multitasking I'm bar System Calls I'm foo I'm bar • Sample output I'm foo I'm bar I'm foo InI'm a rfooeal operating system, traps are how I'm bar • I'm bar I'm foo applicationI'm foo programs request the services of I'm bar theTask operating 2 terminated system (syscalls) I'm foo I'm foo I'm bar I'm foo I'm foo • InI'm our foo code, the scheduler is the operating I'm bar systemI'm foo and the yield statement is a trap I'm foo Task 1 terminated I'm bar I'm foo Copyright (C) 2009, DaTvido Beazle ry,equest http://www.dabeaz.com the service of the scheduler, tasks 128 Task 2 terminated • I'm foo will use the yield statement with a value I'm foo I'm foo I'm foo Task 1 terminated

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 128 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 129

Step 4: System Calls

class SystemCall(object): pyos4.py def handle(self): pass

class Scheduler(object): ... def mainloop(self): while self.taskmap: task = self.ready.get() try: result = task.run() if isinstance(result,SystemCall): result.task = task result.sched = self result.handle() continue except StopIteration: self.exit(task) continue self.schedule(task)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 130 System Calls

• In a real operating system, traps are how application programs request the services of the operating system (syscalls) • In our code, the scheduler is the operating system and the yield statement is a trap • To request the service of the scheduler, tasks will use the yield statement with a value

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 129

Step 4: System Calls Step 4: System Calls class SystemCall(object): pyos4.py class SystemCall(object): def handle(self): def handle(self): pass pass System Call base class. All system operations class Scheduler(object): class Scheduler(object): will be implemented by ...... inheriting from this class. defStep mainloop(self): 4: System Calls def mainloop(self): while self.taskmap: while self.taskmap: class SystemCall(object): task = self.ready.get() task = self.ready.get() def handle(self): try: try: pass System Call base class. result = task.run() All system operations result = task.run() if isinstance(result,SystemCall): if isinstance(result,SystemCall): class Scheduler(object): result.task = taskwill be implemented by result.task = task ... result.sched = selfinheriting from this class. result.sched = self def mainloop(self): result.handle() result.handle() while self.taskmap: continue continue task = self.ready.get() except StopIteration: except StopIteration: try: self.exit(task) self.exit(task) result = task.run() continue continue if isinstance(result,SystemCall): self.schedule(task) self.schedule(task) result.task = task Copyright (C) 2009, Da vid Beazle y, http://www .dabeaz.com result.sched = self 130 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 131 result.handle() continue except StopIteration: self.exit(task) continue self.schedule(task) 131 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Step 4: System Calls

class SystemCall(object): def handle(self): pass

class Scheduler(object): Look at the result ... yielded by the task. If it's Step 4: System Calls defStep mainloop(self): 4: System Calls while self.taskmap: a SystemCall, do some class SystemCall(object): class SystemCall(object): task = self.ready.get() setup and run the system def handle(self): def handle(self): try: pass call on behalf of the task. pass result = task.run() if isinstance(result,SystemCall): class Scheduler(object): Look at the result class Scheduler(object): result.task = task ... yielded by the task. If it's ... result.sched = self def mainloop(self): a SystemCall, do some def mainloop(self): result.handle() while self.taskmap: while self.taskmap: setup and run the system continue task = self.ready.get() excepttask = StopIteration:self.ready.get() try: call on behalf of the task. try: self.exit(task) These attributes hold result = task.run() continueresult = task.run() information about if isinstance(result,SystemCall): self.schedule(task) if isinstance(result,SystemCall):the environment result.task = task result.task = task 132 result.sched = self Copyright (C) 2009, Da vid Beazle y, http://www .dabeaz.com result.sched = self (current task and result.handle() result.handle() scheduler) continue continue except StopIteration: except StopIteration: self.exit(task) self.exit(task) continue continue self.schedule(task) self.schedule(task)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 132 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 133

A First System Call • Return a task's ID number class GetTid(SystemCall): def handle(self): self.task.sendval = self.task.tid self.sched.schedule(self.task) • The operation of this is little subtle class Task(object): ... def run(self): return self.target.send(self.sendval) • The sendval attribute of a task is like a return value from a system call. It's value is sent into the task when it runs again.

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 134 Step 4: System Calls

class SystemCall(object): def handle(self): pass

class Scheduler(object): ... def mainloop(self): while self.taskmap: task = self.ready.get() try: These attributes hold result = task.run() information about if isinstance(result,SystemCall):the environment result.task = task result.sched = self (current task and result.handle() scheduler) continue except StopIteration: self.exit(task) continue self.schedule(task)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 133

A First System Call A First System Call • Return a task's ID number • Example of using a system call class GetTid(SystemCall): def foo(): def handle(self): mytid = yield GetTid() A Firstself.task.sendval System = self.task.tid Call for i in xrange(5): self.sched.schedule(self.task) print "I'm foo", mytid The operation of this is little subtle yield • Example of using a system call • class Task(object): def bar(): mytid = yield GetTid() def foo():... for i in xrange(10): mytiddef run(self): = yield GetTid() print "I'm bar", mytid for ireturn in xrange(5): self.target.send(self.sendval) yield print "I'm foo", mytid The sendval yield attribute of a task is like a return • sched = Scheduler() sched.new(foo()) valuedef bar(): from a system call. It's value is sent into sched.new(bar()) mytid = yield GetTid() the task when it runs again. sched.mainloop() for i in xrange(10): print "I'm bar", mytid Copyright (C) 2009, David Beazley, http://www.dabeaz.com 134 135 yield Copyright (C) 2009, David Beazley, http://www.dabeaz.com

sched = Scheduler() sched.new(foo()) sched.new(bar()) sched.mainloop()

135 Copyright (C) 2009, David Beazley, http://www.dabeaz.com A First System Call • Example output I'm foo 1 I'm bar 2 I'm foo 1 I'm bar 2 A First System Call I'm Designfoo 1 Discussion I'm bar 2 Notice each task has Example output I'm foo 1 a different task id • I'm bar 2 I'm foo 1 Real operating systems have a strong notion of • I'm foo 1 I'm bar 2 "prI'motection" bar 2 (e.g., memory protection) I'm foo 1 Task 1 terminated I'm bar 2 I'm bar 2 I'm foo 1 Application programs are not strongly linked • I'm bar 2 I'm bar 2 Notice each task has toI'm the bar OS 2 kernel (traps are only interface) I'm foo 1 a different task id I'm bar 2 I'm bar 2 I'm bar 2 I'm foo 1 ForTask sanity 2 terminated, we are going to emulate this I'm bar 2 • Task 1 terminated Tasks do not see the scheduler 136 I'm bar 2 Copyright (C) 2009, David Beazle• y, http://www.dabeaz.com I'm bar 2 I'm bar 2 Tasks do not see other tasks I'm bar 2 • I'm bar 2 Task 2 terminated • yield is the only external interface

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 136 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 137

Step 5: Task Management

• Let's make more some system calls • Some task management functions • Create a new task • Kill an existing task • Wait for a task to exit • These mimic common operations with threads or processes

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 138 Design Discussion • Real operating systems have a strong notion of "protection" (e.g., memory protection) • Application programs are not strongly linked to the OS kernel (traps are only interface) • For sanity, we are going to emulate this • Tasks do not see the scheduler • Tasks do not see other tasks • yield is the only external interface

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 137

Step 5: Task Management Creating New Tasks Create a another system call Let's make more some system calls • • class NewTask(SystemCall): pyos5.py def __init__(self,target): • Some task management functions self.target = target Creating New Tasks def handle(self): tid = self.sched.new(self.target) • Create a new task self.task.sendval = tid Create a another system call self.sched.schedule(self.task) • Kill an existing task class• NewTask(SystemCall): pyos5.py Example use: def __init__(self,target): • W self.targetait for a task = target to exit def bar(): • def handle(self): while True: These mimic tid = self.sched.new(self.target)common operations with print "I'm bar" • self.task.sendval = tid yield thr eads self.sched.schedule(self.task)or processes def sometask(): Example use: ... • t1 = yield NewTask(bar()) def bar(): Copyright (C) 2009, David Beazle y , http://wwwwhile.dabeaz.com True: 138 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 139 print "I'm bar" yield

def sometask(): ... t1 = yield NewTask(bar()) 139 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Killing Tasks • More system calls class KillTask(SystemCall): def __init__(self,tid): self.tid = tid def handle(self): Killing Tasks taskAn = self.sched.taskmap.get(self.tid,None) Example if task: More system calls An example task.target.close() of basic task contr ol • • self.task.sendval = True class KillTask(SystemCall): def foo(): else: def __init__(self,tid): mytid = self.task.sendvalyield GetTid() = False self.tid = tid while self.sched.schedule(self.task) True: def handle(self): print "I'm foo", mytid task = self.sched.taskmap.get(self.tid,None) Example yield use: if task: • task.target.close() def sometask(): def main(): self.task.sendval = True t1 = yield NewTask(foo()) child = yield NewTask(foo()) # Launch new task else: ... for i in xrange(5): self.task.sendval = False yield KillTask(t1) yield self.sched.schedule(self.task) yield KillTask(child) # Kill the task Copyright (C) 2009, David Beazley, http://www.dabeaz.com 140 • Example use: print "main done" def sometask(): sched = Scheduler() t1 = yield NewTask(foo()) sched.new(main()) ... sched.mainloop() yield KillTask(t1)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 140 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 141

An Example

• Sample output I'm foo 2 I'm foo 2 I'm foo 2 I'm foo 2 I'm foo 2 Task 2 terminated main done Task 1 terminated

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 142 An Example • An example of basic task control def foo(): mytid = yield GetTid() while True: print "I'm foo", mytid yield

def main(): child = yield NewTask(foo()) # Launch new task for i in xrange(5): yield yield KillTask(child) # Kill the task print "main done"

sched = Scheduler() sched.new(main()) sched.mainloop()

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 141

An Example Waiting for Tasks • This is a more tricky problem... Sample output def foo(): • for i in xrange(5): print "I'm foo" I'm foo 2 yield I'm Wfoo 2 aiting for Tasks I'm foo 2 def main(): ThisI'm foois a 2more tricky problem... child = yield NewTask(foo()) • I'm foo 2 print "Waiting for child" def foo(): Task 2 terminated yield WaitTask(child) for i in xrange(5): main done print "Child done" Task 1 terminatedprint "I'm foo" yield • The task that waits has to remove itself from def main(): the run queue--it sleeps until child exits child = yield NewTask(foo()) print "Waiting for child" yield WaitTask(child) This requires some scheduler changes print "Child done" •

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 142 143 • The task that waits has to remove itself from Copyright (C) 2009, David Beazley, http://www.dabeaz.com the run queue--it sleeps until child exits • This requires some scheduler changes 143 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Task Waiting

class Scheduler(object): pyos6.py def __init__(self): ... self.exit_waiting = {} Task Waiting ... def exit(self,task):Task Waiting print "Task %d terminated" % task.tid class Scheduler(object): pyos6.py class Scheduler(object): del self.taskmap[task.tid] def __init__(self): This is a holding area for def #__init__(self): Notify other tasks waiting for exit ... for... task in self.exit_waiting.pop(task.tid,[]):tasks that are waiting. self.exit_waiting = {} self.exit_waiting self.schedule(task) = {} A dict mapping task ID ...... to tasks waiting for exit. def waitforexit(self,task,waittid): def exit(self,task): def ifexit(self,task): waittid in self.taskmap: print "Task %d terminated" % task.tid print self.exit_waiting.setdefault(waittid,[]).append(task) "Task %d terminated" % task.tid del self.taskmap[task.tid] del returnself.taskmap[task.tid] True # Notify other tasks waiting for exit else:# Notify other tasks waiting for exit for task in self.exit_waiting.pop(task.tid,[]): for returntask in False self.exit_waiting.pop(task.tid,[]): self.schedule(task) self.schedule(task)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 144 def waitforexit(self,task,waittid): def waitforexit(self,task,waittid): if waittid in self.taskmap: if waittid in self.taskmap: self.exit_waiting.setdefault(waittid,[]).append(task) self.exit_waiting.setdefault(waittid,[]).append(task) return True return True else: else: return False return False

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 144 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 145

Task Waiting

class Scheduler(object): def __init__(self): When a task exits, we ... self.exit_waiting = {} pop a list of all waiting ... tasks off out of the waiting area and def exit(self,task): print "Task %d terminated" % task.tid reschedule them. del self.taskmap[task.tid] # Notify other tasks waiting for exit for task in self.exit_waiting.pop(task.tid,[]): self.schedule(task)

def waitforexit(self,task,waittid): if waittid in self.taskmap: self.exit_waiting.setdefault(waittid,[]).append(task) return True else: return False

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 146 Task Waiting

class Scheduler(object): def __init__(self): This is a holding area for ... tasks that are waiting. self.exit_waiting = {} A dict mapping task ID ... to tasks waiting for exit. def exit(self,task): print "Task %d terminated" % task.tid del self.taskmap[task.tid] # Notify other tasks waiting for exit for task in self.exit_waiting.pop(task.tid,[]): self.schedule(task)

def waitforexit(self,task,waittid): if waittid in self.taskmap: self.exit_waiting.setdefault(waittid,[]).append(task) return True else: return False

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 145

Task Waiting Task Waiting

class Scheduler(object): class Scheduler(object): def __init__(self): When a task exits, we def __init__(self): ...... self.exit_waiting = {} pop a list of all waiting self.exit_waiting = {} ... tasks off out of the ... Task Waitingwaiting area and def exit(self,task): def exit(self,task): print "Task %d terminated" % task.tid reschedule them. print "Task %d terminated" % task.tid class Scheduler(object): del self.taskmap[task.tid] del self.taskmap[task.tid] A utility method that def __init__(self): # Notify other tasks waiting for exit # Notify other tasks waiting makfor esexit a task wait for ... for task in self.exit_waiting.pop(task.tid,[]): for task in self.exit_waiting.pop(task.tid,[]): self.exit_waiting = {} another task. It puts the self.schedule(task) self.schedule(task) ... task in the waiting area. def waitforexit(self,task,waittid): def waitforexit(self,task,waittid): def exit(self,task): if waittid in self.taskmap: if waittid in self.taskmap: print "Task %d terminated" % task.tid self.exit_waiting.setdefault(waittid,[]).append(task) self.exit_waiting.setdefault(waittid,[]).append(task) del self.taskmap[task.tid] A utility method that return True return True # Notify other tasks waiting makfor esexit a task wait for else: else: for task in self.exit_waiting.pop(task.tid,[]): return False another task. It puts the return False self.schedule(task) task in the waiting area. Copyright (C) 2009, David Beazley, http://www.dabeaz.com 146 147 def waitforexit(self,task,waittid): Copyright (C) 2009, David Beazley, http://www.dabeaz.com if waittid in self.taskmap: self.exit_waiting.setdefault(waittid,[]).append(task) return True else: return False

147 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Task Waiting • Here is the system call class WaitTask(SystemCall): def __init__(self,tid): self.tid = tid Task Waiting T ask def handle(self): Waiting Example result = self.sched.waitforexit(self.task,self.tid) self.task.sendval = result • Here is the system call # If waiting for a non-existent task, # return immediately without waiting class WaitTask(SystemCall): Her e is if some not result: example code: def __init__(self,tid): • self.sched.schedule(self.task) self.tid = tid def foo(): def handle(self): Note: for Ha iv ine toxrange(5): be careful with error handling. result = self.sched.waitforexit(self.task,self.tid) • print "I'm foo" self.task.sendval = result yield # If waiting for a non-existent task, • The last bit immediately reschedules if the # return immediately without waiting taskdef beingmain(): waited for doesn't exist if not result: child = yield NewTask(foo()) self.sched.schedule(self.task) print "Waiting for child" 148 Copyright (C) 2009, David Beazle y , http://www yield.dabeaz.com WaitTask(child) • Note: Have to be careful with error handling. print "Child done" • The last bit immediately reschedules if the task being waited for doesn't exist

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 148 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 149

Task Waiting Example

• Sample output: Waiting for child I'm foo 2 I'm foo 2 I'm foo 2 I'm foo 2 I'm foo 2 Task 2 terminated Child done Task 1 terminated

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 150 Task Waiting Example

• Here is some example code: def foo(): for i in xrange(5): print "I'm foo" yield

def main(): child = yield NewTask(foo()) print "Waiting for child" yield WaitTask(child) print "Child done"

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 149

Task Waiting Example Design Discussion

• Sample output: • The only way for tasks to refer to other tasks Waiting for child is using the integer task ID assigned by the the DesignI'm foo 2 Discussion scheduler I'm foo 2 I'm foo 2 I'm foo 2 • This is an encapsulation and safety strategy The onlI'm yfoo wa 2y for tasks to refer to other tasks • Task 2 terminated It keeps tasks separated (no linking to internals) is usingChild the done integer task ID assigned by the the • schedulerTask 1 terminated • It places all task management in the scheduler (which is where it properly belongs) • This is an encapsulation and safety strategy • It keeps tasks separated (no linking to internals)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 150 151 • It places all task management in the scheduler Copyright (C) 2009, David Beazley, http://www.dabeaz.com (which is where it properly belongs)

151 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Interlude

• Running multiple tasks. Check. Interlude An• Launching Echo new tasks.Ser Check.ver Attempt def handle_client(client,addr):Some basic task management. Check. echobad.py • print "Connection from", addr while True: Running multiple tasks. Check. The data next = client.recv(65536) step is obvious • • if not data: Launching new tasks. Check. W e m ustbreak implement a web framework... • • client.send(data) client.close() Some basic task management. Check. print... or "Clientmaybe closed" just an echo sever to start. • • yield # Make the function a generator/coroutine • The next step is obvious def server(port): print "Server starting" 152 We must implement a web framework... Copyright (C) 2009, Dasockvid Beazle y, =http://www socket(AF_INET,SOCK_STREAM).dabeaz.com • sock.bind(("",port)) sock.listen(5) • ... or maybe just an echo sever to start. while True: client,addr = sock.accept() yield NewTask(handle_client(client,addr))

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 152 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 153

An Echo Server Attempt def handle_client(client,addr): print "Connection from", addr while True: data = client.recv(65536) if not data: break client.send(data) client.close() print "Client closed" yield # Make the function a generator/coroutine

def server(port): The main server loop. print "Server starting" sock = socket(AF_INET,SOCK_STREAM) Wait for a connection, sock.bind(("",port)) launch a new task to sock.listen(5) handle each client. while True: client,addr = sock.accept() yield NewTask(handle_client(client,addr))

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 154 An Echo Server Attempt def handle_client(client,addr): echobad.py print "Connection from", addr while True: data = client.recv(65536) if not data: break client.send(data) client.close() print "Client closed" yield # Make the function a generator/coroutine

def server(port): print "Server starting" sock = socket(AF_INET,SOCK_STREAM) sock.bind(("",port)) sock.listen(5) while True: client,addr = sock.accept() yield NewTask(handle_client(client,addr))

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 153

An Echo Server Attempt An Echo Server Attempt def handle_client(client,addr): def handle_client(client,addr): print "Connection from", addr print "Connection from", addr while True: while True: Client handling. Each data = client.recv(65536) data = client.recv(65536) client will be executing if not data: if not data: break break this task (in theory) An client.send(data) Echo Server Attempt client.send(data) client.close() client.close() def handle_client(client,addr): print "Client closed" print "Client closed" print "Connection from", addr yield # Make the function a generator/coroutine yield # Make the function a generator/coroutine while True: Client handling. Each data = client.recv(65536) def server(port): client will be executing def server(port): if not data: The main server loop. print "Server starting" print "Server starting" break this task (in theory) sock = socket(AF_INET,SOCK_STREAM) Wait for a connection, sock = socket(AF_INET,SOCK_STREAM) client.send(data) sock.bind(("",port)) launch a new task to sock.bind(("",port)) client.close() sock.listen(5) sock.listen(5) print "Client closed" handle each client. while True: while True: yield # Make the function a generator/coroutine client,addr = sock.accept() client,addr = sock.accept() yield NewTask(handle_client(client,addr)) yield NewTask(handle_client(client,addr)) def server(port): print "Server starting" Copyright (C) 2009, Dasockvid Beazle y, =http://www socket(AF_INET,SOCK_STREAM).dabeaz.com 154 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 155 sock.bind(("",port)) sock.listen(5) while True: client,addr = sock.accept() yield NewTask(handle_client(client,addr))

155 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Echo Server Example • Execution test def alive(): while True: print "I'm alive!" Echo Server Example yield Blockingsched = Scheduler() Operations sched.new(alive()) Execution test sched.new(server(45000)) • Insched.mainloop() the example various I/O operations block def alive(): • while True: Outputclient,addr = sock.accept() print "I'm alive!" • data = client.recv(65536) yield client.send(data)I'm alive! sched = Scheduler() Server starting sched.new(alive()) The... (freezes)real operating ... system (e.g., Linux) suspends sched.new(server(45000)) • sched.mainloop() • theThe entir schedulere Python locks interpr up andeter ne untilver runsthe I/Oany • Output operationmore tasks completes (bummer) I'm alive! Copyright (C) 2009, David Beazley, http://www.dabeaz.com 156 Server starting • Clearly this is pretty undesirable for our ... (freezes) ... multitasking operating system (any blocking • The scheduler locks up and never runs any operation freezes the whole program) more tasks (bummer)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 156 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 157

Non-blocking I/O • The select module can be used to monitor a collection of sockets (or files) for activity reading = [] # List of sockets waiting for read writing = [] # List of sockets waiting for write

# Poll for I/O activity r,w,e = select.select(reading,writing,[],timeout)

# r is list of sockets with incoming data # w is list of sockets ready to accept outgoing data # e is list of sockets with an error state • This can be used to add I/O support to our OS • This is going to be similar to task waiting

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 158 Blocking Operations • In the example various I/O operations block client,addr = sock.accept() data = client.recv(65536) client.send(data) • The real operating system (e.g., Linux) suspends the entire Python interpreter until the I/O operation completes • Clearly this is pretty undesirable for our multitasking operating system (any blocking operation freezes the whole program)

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 157

Non-blocking I/O Step 6 : I/O Waiting The select module can be used to monitor a class Scheduler(object): pyos7.py • def __init__(self): collection of sockets (or files) for activity ... self.read_waiting = {} reading = [] # List of sockets waiting for read self.write_waiting = {} writingStep = [] 6 # :List I/O of sockets W waitingaiting for write ... # Poll for I/O activity def waitforread(self,task,fd): class Scheduler(object): pyos7.py r,w,e = select.select(reading,writing,[],timeout) self.read_waiting[fd] = task def __init__(self): def waitforwrite(self,task,fd): ... # r is list of sockets with incoming data self.write_waiting[fd] = task #self.read_waiting w is list of sockets = {} ready to accept outgoing data self.write_waiting = {} # e is list of sockets with an error state def iopoll(self,timeout): ... if self.read_waiting or self.write_waiting: r,w,e = select.select(self.read_waiting, def Thiswaitforread(self,task,fd): can be used to add I/O support to our OS • self.write_waiting,[],timeout) self.read_waiting[fd] = task for fd in r: self.schedule(self.read_waiting.pop(fd)) def waitforwrite(self,task,fd): This is going to be similar to task waiting for fd in w: self.schedule(self.write_waiting.pop(fd)) • self.write_waiting[fd] = task ... def iopoll(self,timeout): Copyright (C) 2009, David Beazley, http://www.dabeaz.com 158 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 159 if self.read_waiting or self.write_waiting: r,w,e = select.select(self.read_waiting, self.write_waiting,[],timeout) for fd in r: self.schedule(self.read_waiting.pop(fd)) for fd in w: self.schedule(self.write_waiting.pop(fd)) ...

159 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Step 6 : I/O Waiting

class Scheduler(object): def __init__(self): Holding areas for tasks ... self.read_waiting = {} blocking on I/O. These self.write_waiting = {} are dictionaries mapping Step 6 : I/O Waiting ...Step 6 : I/O Wfileaiting descriptors to tasks def waitforread(self,task,fd): class Scheduler(object): class Scheduler(object): self.read_waiting[fd] = task def __init__(self): Holding areas for tasks def waitforwrite(self,task,fd):__init__(self): ... self.write_waiting[fd]... = task self.read_waiting = {} blocking on I/O. These self.read_waiting = {} self.write_waiting = {} are dictionaries mapping def iopoll(self,timeout):self.write_waiting = {} ... file descriptors to tasks if... self.read_waiting or self.write_waiting:Functions that simply put r,w,e = select.select(self.read_waiting,a task into one of the def waitforread(self,task,fd): def waitforread(self,task,fd): self.write_waiting,[],timeout) self.read_waiting[fd] = task self.read_waiting[fd] for fd in r: self.schedule(self.read_waiting.pop(fd)) = task above dictionaries def waitforwrite(self,task,fd): def waitforwrite(self,task,fd): for fd in w: self.schedule(self.write_waiting.pop(fd)) self.write_waiting[fd] = task ... self.write_waiting[fd] = task def iopoll(self,timeout): def iopoll(self,timeout): Copyright (C) 2009, David Beazley, http://www.dabeaz.com 160 if self.read_waiting or self.write_waiting: if self.read_waiting or self.write_waiting: r,w,e = select.select(self.read_waiting, r,w,e = select.select(self.read_waiting, self.write_waiting,[],timeout) self.write_waiting,[],timeout) for fd in r: self.schedule(self.read_waiting.pop(fd)) for fd in r: self.schedule(self.read_waiting.pop(fd)) for fd in w: self.schedule(self.write_waiting.pop(fd)) for fd in w: self.schedule(self.write_waiting.pop(fd)) ......

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 160 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 161

Step 6 : I/O Waiting

class Scheduler(object): def __init__(self): ... self.read_waiting = {} self.write_waiting = {} ... I/O Polling. Use select() to def waitforread(self,task,fd): determine which file self.read_waiting[fd] = task descriptors can be used. def waitforwrite(self,task,fd): self.write_waiting[fd] = taskUnblock any associated task.

def iopoll(self,timeout): if self.read_waiting or self.write_waiting: r,w,e = select.select(self.read_waiting, self.write_waiting,[],timeout) for fd in r: self.schedule(self.read_waiting.pop(fd)) for fd in w: self.schedule(self.write_waiting.pop(fd)) ...

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 162 Step 6 : I/O Waiting

class Scheduler(object): def __init__(self): ... self.read_waiting = {} self.write_waiting = {} ... Functions that simply put def waitforread(self,task,fd): a task into one of the self.read_waiting[fd] = task above dictionaries def waitforwrite(self,task,fd): self.write_waiting[fd] = task

def iopoll(self,timeout): if self.read_waiting or self.write_waiting: r,w,e = select.select(self.read_waiting, self.write_waiting,[],timeout) for fd in r: self.schedule(self.read_waiting.pop(fd)) for fd in w: self.schedule(self.write_waiting.pop(fd)) ...

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 161

Step 6 : I/O Waiting When to Poll? class Scheduler(object): Polling is actually somewhat tricky. def __init__(self): • ... self.read_waiting = {} • You could put it in the main event loop self.write_waiting = {} class Scheduler(object): ... When to Poll? ... I/O Polling. Use select() to def mainloop(self): defP ollingwaitforread(self,task,fd): is actually somewhat trickydetermine. which file while self.taskmap: • self.read_waiting[fd] = task descriptors can be used. self.iopoll(0) def waitforwrite(self,task,fd): Unblock any associated task. task = self.ready.get() Y ouself.write_waiting[fd] could put it in the = taskmain event loop try: • result = task.run() def classiopoll(self,timeout): Scheduler(object): if self.read_waiting... or self.write_waiting: Problem : This might cause excessive polling r,w,edef mainloop(self): = select.select(self.read_waiting, • while self.taskmap: self.write_waiting,[],timeout) for fd in self.iopoll(0)r: self.schedule(self.read_waiting.pop(fd)) Especially if there are a lot of pending tasks for fd in taskw: self.schedule(self.write_waiting.pop(fd)) = self.ready.get() • ... try: already on the ready queue result = task.run()

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 162 163 • Problem : This might cause excessive polling Copyright (C) 2009, David Beazley, http://www.dabeaz.com • Especially if there are a lot of pending tasks already on the ready queue

163 Copyright (C) 2009, David Beazley, http://www.dabeaz.com A Polling Task • An alternative: put I/O polling in its own task class Scheduler(object): ... def iotask(self): while True: A Polling Task Read/Write if self.ready.empty(): Syscalls self.iopoll(None) An alternative: put I/O polling in its own task T w o ne w system else: calls • • self.iopoll(0) class Scheduler(object): class ReadWait(SystemCall): yield pyos7.py ... def __init__(self,f): def iotask(self): def mainloop(self):self.f = f while True: def self.new(self.iotask())handle(self): # Launch I/O polls if self.ready.empty(): whilefd = self.f.fileno()self.taskmap: self.iopoll(None) self.sched.waitforread(self.task,fd) task = self.ready.get() else: ... self.iopoll(0) class WriteWait(SystemCall): yield This defjust __init__(self,f): runs with every other task (neat) • self.f = f def mainloop(self): def handle(self): Copyright (C) 2009, David Beazley, http://www.dabeaz.com 164 self.new(self.iotask()) # Launch I/O polls fd = self.f.fileno() while self.taskmap: self.sched.waitforwrite(self.task,fd) task = self.ready.get() ... • These merely wait for I/O events, but do not • This just runs with every other task (neat) actually perform any I/O

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 164 Copyright (C) 2009, David Beazley, http://www.dabeaz.com 165

A New Echo Server def handle_client(client,addr): echogood.py print "Connection from", addr while True: yield ReadWait(client) data = client.recv(65536) if not data: break yield WriteWait(client) All I/O operations are client.send(data) now preceded by a client.close() print "Client closed" waiting system call

def server(port): print "Server starting" sock = socket(AF_INET,SOCK_STREAM) sock.bind(("",port)) sock.listen(5) while True: yield ReadWait(sock) client,addr = sock.accept() yield NewTask(handle_client(client,addr)) Copyright (C) 2009, David Beazley, http://www.dabeaz.com 166 Read/Write Syscalls • Two new system calls class ReadWait(SystemCall): pyos7.py def __init__(self,f): self.f = f def handle(self): fd = self.f.fileno() self.sched.waitforread(self.task,fd)

class WriteWait(SystemCall): def __init__(self,f): self.f = f def handle(self): fd = self.f.fileno() self.sched.waitforwrite(self.task,fd) • These merely wait for I/O events, but do not actually perform any I/O

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 165

A New Echo Server Echo Server Example def handle_client(client,addr): echogood.py print "Connection from", addr • Execution test while True: def alive(): yield ReadWait(client) while True: data = client.recv(65536) print "I'm alive!" if not data: yield Echo break Server Example sched = Scheduler() yield WriteWait(client) All I/O operations are sched.new(alive()) Ex ecution client.send(data) test now preceded by a sched.new(server(45000)) • client.close() sched.mainloop() printdef alive(): "Client closed" waiting system call while True: def server(port): print "I'm alive!" • You will find that it now works (will see alive print "Server yield starting" messages printing and you can connect) socksched = =socket(AF_INET,SOCK_STREAM) Scheduler() sock.bind(("",port))sched.new(alive()) sock.listen(5)sched.new(server(45000)) Remove the alive() task to get rid of messages whilesched.mainloop() True: • yield ReadWait(sock) echogood2.py Y ou client,addrwill find that = sock.accept() it now works (will see alive • yield NewTask(handle_client(client,addr)) 166 Copyright (C) 2009, Davidmessages Beazley, http://www.dabeaz.com printing and you can connect) Copyright (C) 2009, David Beazley, http://www.dabeaz.com 167 • Remove the alive() task to get rid of messages echogood2.py

167 Copyright (C) 2009, David Beazley, http://www.dabeaz.com Congratulations!

• You have just created a multitasking OS Congratulations! • Tasks can run concurrently • Tasks can create, destroy, and wait for tasks • You have just created a multitasking OS • Tasks can perform I/O operations • Tasks can run concurrently • You can even write a concurrent server • Tasks can create, destroy, and wait for tasks • Excellent! • Tasks can perform I/O operations You can even write a concurrent server • Copyright (C) 2009, David Beazley, http://www.dabeaz.com 168 • Excellent!

Copyright (C) 2009, David Beazley, http://www.dabeaz.com 168