<<

Handout 8 CS602 –Data-Driven Development with –Spring’21 Page 1 of 6

Handout 8

Tuples, Sets and Dictionaries.

TUPLE

Tuples are like lists (ordered collection of objects) except they are immutable. Used, when you need an ordered collection that will not be changed. • Tuples are created with commas, or using () >>> t = 3, 'foo' >>> t (3, 'foo') >>> t = tuple('abcdef') >>> t ('a', 'b', 'c', 'd', 'e', 'f')

• Empty tuples are constructed by an empty pair of parentheses • A tuple with one item is constructed by following a value with a comma (it is not sufficient to enclose a single value in parentheses). >>> e = () >>> e () >>> = 'hi' >>> s 'hi' >>> s = 'hi', >>> s ('hi',) >>> len(s) 1 • Sequence packing/unpacking: >>> a = 45, 'g', 456.6 >>> a (45, 'g', 456.6) >>> b, c, d = a >>> b 45 >>> c 'g' >>> d 456.6

• Slicing >>> t = tuple('abcdef') >>> t ('a', 'b', 'c', 'd', 'e', 'f') >>> t [1:5:2] ('b', 'd')

- 1 - Handout 8 CS602 –Data-Driven Development with –Spring’21 Page 2 of 6

>>> t [:-3] ('a', 'b', 'c')

>>> t[4:] ('e', 'f'):

• Subtle point: composition of a tuple cannot be changed (i.e. references to objects in a tuple stay the same), but if a tuple contains a mutable object, that object content can be changed. >>> tu = 5, 'foo', [3,4,5] >>> tu (5, 'foo', [3, 4, 5])

>>> tu[0] = 6 TypeError: 'tuple' object does not support item assignment

>>> tu[2][0] = 1000 >>> tu (5, 'foo', [1000, 4, 5])

Methods applicable to tuples t.count(e): int returns occurrences of e in a tuple t t.index(e): int returns smallest index of element e in tuple t; run time error if e not in t

Functions applicable to tuples

any(t) :bool returns true if at least one element of an iterable is equivalent to True all(t) :bool returns true when all elements in iterable is equivalent to True len(t) :int returns length of an object max(t) returns largest element

min(t) returns smallest element sorted(t): lst returns sorted list from a given iterable sum(t): returns sum of items of an iterable zip (s1,s2,..):lst returns an iterable, composed of tuples of individual elements from sequences s1, s2, … at the same index

Example:

''' Iterating over zipped sequences ''' city =['Boston', 'Paris', 'Lagos'] area = [11700, 17174, 2706] #sq km population =[4628910, 12405426, 21000000]

for ci,a,pop in zip(city, area, population): print ('popluation density of ', ci, 'is', round( pop/a, 1))

- 2 - Handout 8 CS602 –Data-Driven Development with –Spring’21 Page 3 of 6

SET

A is an unordered collection with no duplicate elements. Sets are mutable, supporting add/remove/ Basic uses include membership testing and eliminating duplicate entries. Set objects also support mathematical operations like , , difference, and .

Create sets using curly braces {} or the set() function . Note: to create an empty set you have to use set(), not {}; the latter creates an empty dictionary.

Operations on sets: in, not in, len(), max(), min() Set operations as methods: intersection, union, difference, symmetric_difference, issubset, issuperset, isdisjoint Same set operations as binary operations: &, |, -, ^

Set-modifying methods: s.add (e): add element e to set s. s.remove (e): remove element e from set s; error if e is not in s. s.discard (e): remove element e from set s, if it is present. s.clear() removes all elements from s

Examples: >>> s = {1,2,3} >>> s.add(1) >>> s {1, 2, 3} >>> s[2] =4 TypeError: 'set' object does not support item assignment >>> s {1, 2, 3} >>> a = {2, 3, 40, 50} >>> s.intersection(a) {2, 3}

>>> s.difference(a) {1}

>>> s.union(a) {1, 2, 3, 40, 50}

>>> s | a {1, 2, 3, 40, 50}

>>> s.symmetric_difference(a) {1, 40, 50} >>> a ^ s {1, 40, 50}

- 3 - Handout 8 CS602 –Data-Driven Development with –Spring’21 Page 4 of 6

DICTIONARY

Dictionary, a.k.a hash or associative array, is a flexibly-sized collection of key-value pairs, where key and value are Python objects. • Key-value pairs are not stored in any specific order. • Keys must be of an immutable type (e.g. int, string, tuple). • Value can be of any type. • Create a dictionary by using curly braces{} and listing key:value pairs separated by a comma:

dict = {} # Create an empty dictionary dict = {"john":40, "peter":45} # Create a dictionary

• To add an entry to a dictionary: dictionary[key] = value, e.g. dict["susan"] = 50 To delete an entry from a dictionary, use del dictionary[key] • To check if a key is in the dictionary: in

Method(params):returns Description d.keys():dict_keys Returns a sequence of keys. d.values():dict_values Returns a sequence of values. d.items():dict_items Returns a sequence of tuples (key, value). d.clear(): None Deletes all entries. d.get(key): value Returns the value for the key. d.pop(key): value Removes the entry for the key and returns its value.

Examples:

>>> names = { 'Courtney':753, 'Alexis':119, 'Steven':279 } >>> names {'Courtney': 753, 'Alexis': 119, 'Steven': 279} >>> 'John' in names False >>> 'Alexis' in names True

>>> names['Steven'] = 35 >>> names {'Courtney': 753, 'Alexis': 119, 'Steven': 35} >>> names['Mark'] = 463 >>> names {'Courtney': 753, 'Alexis': 119, 'Steven': 35, 'Mark': 463} >>> names.get('susan') None >>> names.get('Alexis') 119

# Iterating examples: >>> for n,v in names.items(): >>> print (n, v)

Courtney 753

- 4 - Handout 8 CS602 –Data-Driven Development with –Spring’21 Page 5 of 6

Alexis 119 Steven 35 Mark 463 >>> for n in names: >>> print(names[n])

753 119 35 463

>>> len(names) 4 >>> names.pop('Courtney') 753 >>> names {'Alexis': 119, 'Steven': 279, 'Mark': 463}

DICTIONARIES AND SORTING

Dictionary object cannot be made to arrange its items in any particular order. However, it can be accessed in a specific order. Function sorted helps with that. names = { 'Courtney':753, 'Alexis':119, 'Steven':15 } keysSortedLst = sorted(names.items()) print(keysSortedLst) prints: [('Alexis', 119), ('Courtney', 753), ('Steven', 15)]

What if you want to access the dict in the order of values, for example? Supply an additional parameter key to sorted, which will specify the value by which to sort. That parameter is a function, which returns the second element in a tuple. names = { 'Courtney':753, 'Alexis':119, 'Steven':15 }

#define a function, which will extract the second element in a sequence: def itemvalue(tuple): # return the second element. For a dictionary item - that would be the value return tuple[1] valueSortedList = sorted(names.items(), key=itemvalue) print(valueSortedList) prints: [('Steven', 15), ('Alexis', 119), ('Courtney', 753)]

Function sorted (iterable, key=None, reverse=False) Returns a new sorted list from the items in iterable. • iterable -- an object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects

- 5 - Handout 8 CS602 –Data-Driven Development with –Spring’21 Page 6 of 6

• key -- specifies a function of one that is used to extract a comparison key from each element in iterable (for example, key =str.lower). The default value is None (compare the elements directly). • reverse is a boolean value. If set to True, then the list elements are sorted as if each comparison were reversed.

COMPREHENSIONS

Comprehension can be used to also create sets and dictionaries.

Practice problem:

Assume speech1.txt and speech2.txt are two files containing texts of speeches in ‘text’ subfolder of the current working directory. Define functions to 1. Open the files and process the text of the two speeches to remove punctuation and stop words (see posted file stopwords.py), 2. Find and return a collection of words that are present in both speeches 3. For each speech, find and return 10 most frequent words. 4. Find and return 10 most frequent words among the words present in both speeches. 5. Find and return 10 most frequent words among those that are unique for one of the speeches.

- 6 -