Course: Programming I, 2020 Week 9 – Final week

Per Starbäck

All functions this time should be in a le week9.py. All ('A King in New York', 'A Countess from Hong handling of les should use the ways taught this week. Kong')

6704, 24566, 25974, 27586, 42073, ... It should be possible to use this function for very large les, so don’t read in the complete le at the same Write a function make_random_numbers() which creates time. Make sure that the function instead only remem- a new le random-numbers.txt which contains 1000 lines bers the two previous lines as it continues reading the with random numbers from 0 to 999,999, one on each le. line, sorted in numerical order, with the smallest num- ber on the rst line. Titus, Totus & Tutus, etc.

Two lines At https://inducks.org is a world-wide fan-made data- base about where you can look up for _ _ Write a function two lines before(filename, text) example which Disney comics have been published in 1 which returns two values, the two lines preceding what publications, and who have made them. the rst line in a le which has that text in it. (Use the You can download internal les with the database operator in to determine that.) from https://inducks.org/inducks/, in particular isv.tgz The returned lines should be without any terminat- there. Do so! ing newline characters. If the text is found in the rst line the two values should both be None. If the text is Shell tip found in the second line the rst value should be None. If there is no line with the text this function should The le isv.tgz is a gzipped tar archive. (It could treat that as if it found the text on a line after all the also have been called isv.tar.gz.) You can let the lines in the le, that is return the two last lines of the web browser extract the les in it for you. If you le. instead just save the tgz le you can extract the Here are some examples using the chaplin.txt used les from it in a Unix shell with in an earlier assignment, which is probably in your p1 tar -xf isv.tgz folder since then. The command tar handles tar archives like this. The option -x tells it to extract les, and >>> r1, r2 = two_lines_before('chaplin.txt', 'Gold') the option -f with an argument tells it which >>> r1 le to extract from. 'The Pilgrim' The contents of the archive will be extracted >>> r2 in your current directory. 'A Woman of Paris' >>> two_lines_before('chaplin.txt', 'x') ('Modern Times', 'The Great Dictator') >>> two_lines_before('chaplin.txt', 'Auto') It contains a folder isv and one of the les there is (None, 'Making a Living') inducks_charactername.isv with information on what dif- >>> two_lines_before('chaplin.txt', 'ivi') ferent Disney comics characters are called in dierent (None, None) languages. Take a look at the contents of it! >>> two_lines_before('chaplin.txt', 'Duck') Each line of the le has elds that you can split with 1That is, actually a tuple with two items. the split method. The rst eld gives a standardized

1 Python tip

The .split method can have two arguments where the second argument says what the max- imum number of splits should be. So if for example with

line= 'aaa:bbb:ccc:ddd:eee' a,b, rest = line.split(':', 2)

a will be set to 'aaa', b to 'bbb' and rest to 'ccc:ddd:eee'. This can be useful when you only need some of the elds in a line.

code for the character (like DD for ), and then other elds say what that character is called in some language. Sometimes there are several names for the same char- acter in the same language. For example there are these Donaldus Anas atque nox saraceni by Italian artist Marco lines: Rota was the first Disney comics story published in Latin.

BW^en^The ^Y^^ BW^en^Zeke Wolf^N^^ BW^en^The Bad Wolf^N^^ If given the language codes la (= Latin) and it BW^en^Zeke Midas Wolf^N^^ (= Italian) the result could look something like this:

The fourth entry which is always ‘Y’ or ‘N’ says if Donaldus Anas Paperino this is the preferred name or not. There is only one Míchaël Músculus Scrúgulus Zio Paperone preferred name for each character in each language, in Titus Totus Tutus Qui Quo Qua this case ‘The Big Bad Wolf’ in English. (The language code for English is en.) and if given in the other order it could look like: Write a function trans_table(fromlang, tolang) Paperino Donaldus Anas _ which uses that le inducks charactername.isv to print a Qui Quo Qua Titus Totus Tutus translation table that shows what the same characters Topolino Míchaël Músculus are called in two dierent languages. Zio Paperone Scrúgulus The function should rst read the database le line These are very short results since there are so few by line and create two dictionaries as it does so, then Latin names in the database. Try your function with print its output in two columns. other languages! • Only use names that are preferred! When you format a string you can state how wide • Only read through the database le once the eld should be in characters. I haven’t talked about • Print the table in two columns how to do this, but you can nd how to do that with • Only include names that have entries for both f-strings to get nice columns like in the example by languages adding padding to the right of the rst name in the • Don’t include names that are the same in the two lines. languages This assumes that all characters have the same width. • Sort the lines according to the names in the rst If you have text in the rst column that includes wide language. (You can use sorted for that.2) characters, like if you do transtable('zh-hans', ‘en’) it is harder to get nice-looking columns. It is ok if your program doesn’t produce nice-looking columns in that 2Ideally they should be sorted according to sorting conventions for that language. If you want to nd a way to do that, you are case. (The code zh-hans stands for Chinese written with welcome, but that is hard and very optional. Simplied Chinese characters.)

2