Jupyter Snippet P4M 03

Jupyter Snippet P4M 03

All of these python notebooks are available at https://gitlab.erc.monash.edu.au/andrease/Python4Maths.git

Data Structures

So far we have only seen numbers and strings and how to write simple expressions involving these. In general writing programs is about managing more complex collections of such items which means think about data structures for storing the data and algorithms for manipulating them. This part of the tutorial and the next looks at the some of the powerful built-in data structures that are included in Python, namely list, tuple, dict and set data structures.

Lists

Lists are the most commonly used data structure. Think of it as a sequence of data that is enclosed in square brackets and data are separated by a comma. Each element of a list can be accessed the position of the element within the list.

Lists are declared by just equating a variable to ‘[ ]’ or list.

a = []
type(a)
list

One can directly assign the sequence of data to a list x as shown.

x = ['apple', 'orange']

Indexing

In python, indexing starts from 0 as already seen for strings. Thus now the list x, which has two elements will have apple at 0 index and orange at 1 index.

x[0]
'apple'

Indexing can also be done in reverse order. That is the last element can be accessed first. Here, indexing starts from -1. Thus index value -1 will be orange and index -2 will be apple.

x[-1]
'orange'

As you might have already guessed, x[0] = x[-2], x[1] = x[-1]. This concept can be extended towards lists with more many elements.

y = ['carrot','potato']

Here we have declared two lists x and y each containing its own data. Now, these two lists can again be put into another list say z which will have it’s data as two lists. This list inside a list is called as nested lists and is how an array would be declared which we will see later.

z  = [x,y]
print( z )
[['apple', 'orange'], ['carrot', 'potato']]

Indexing in nested lists can be quite confusing if you do not understand how indexing works in python. So let us break it down and then arrive at a conclusion.

Let us access the data ‘apple’ in the above nested list. First, at index 0 there is a list [‘apple’,‘orange’] and at index 1 there is another list [‘carrot’,‘potato’]. Hence z[0] should give us the first list which contains ‘apple’ and ‘orange’. From this list we can take the second element (index 1) to get ‘orange’

print(z[0][1])
orange

Lists do not have to be homogenous. Each element can be of a different type:

["this is a valid list",2,3.6,(1+2j),["a","sublist"]]
['this is a valid list', 2, 3.6, (1+2j), ['a', 'sublist']]

Slicing

Indexing was only limited to accessing a single element, Slicing on the other hand is accessing a sequence of data inside the list. In other words “slicing” the list.

Slicing is done by defining the index values of the first element and the last element from the parent list that is required in the sliced list. It is written as parentlist[ a : b ] where a,b are the index values from the parent list. If a or b is not defined then the index value is considered to be the first value for a if a is not defined and the last value for b when b is not defined.

num = [0,1,2,3,4,5,6,7,8,9]
print(num[0:4])
print(num[4:])
[0, 1, 2, 3]
[4, 5, 6, 7, 8, 9]

You can also slice a parent list with a fixed length or step length.

num[:9:3]
[0, 3, 6]

Built in List Functions

To find the length of the list or the number of elements in a list, len( ) is used.

len(num)
10

If the list consists of all integer elements then min( ) and max( ) gives the minimum and maximum value in the list. Similarly sum is the sum

print("min =",min(num),"  max =",max(num),"  total =",sum(num))
min = 0   max = 9   total = 45

Lists can be concatenated by adding, ‘+’ them. The resultant list will contain all the elements of the lists that were added. The resultant list will not be a nested list.

[1,2,3] + [5,4,7]
[1, 2, 3, 5, 4, 7]

There might arise a requirement where you might need to check if a particular element is there in a predefined list. Consider the below list.

names = ['Earth','Air','Fire','Water']

To check if ‘Fire’ and ‘Metal’ are present in the list names. A conventional approach would be to use a for loop and iterate over the list and use the if condition. But in python you can use a in b concept which would return ‘True’ if a is present in b and ‘False’ if not.

'Fire' in names
True
'Metal' in names
False

In a list with string elements, *max( ) and min( ) are still applicable and return the first/last element in lexicographical order.

mlist = ['bzaa','ds','nc','az','z','klm']
print("max =",max(mlist))
print("min =",min(mlist))
max = z
min = az

Here the first index of each element is considered and thus z has the highest ASCII value thus it is returned and minimum ASCII is a. But what if numbers are declared as strings?

nlist = ['5','24','93','1000']
print("max =",max(nlist))
print('min =',min(nlist))
max = 93
min = 1000

Even if the numbers are declared in a string the first index of each element is considered and the maximum and minimum values are returned accordingly.

But if you want to find the max( ) string element based on the length of the string then another parameter key can be used to specify the function to use for generating the value on which to sort. Hence finding the longest and shortest string in mlist can be doen using the len function:

print('longest =',max(mlist, key=len))
print('shortest =',min(mlist, key=len))
longest = bzaa
shortest = z

Any other built-in or user defined function can be used.

A string can be converted into a list by using the list() function, or more usefully using the split() method, which breaks strings up based on spaces.

print(list('hello world !'),'Hello   World !!'.split())
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', ' ', '!'] ['Hello', 'World', '!!']

append( ) is used to add a single element at the end of the list.

lst = [1,1,4,8,7]
lst.append(1)
print(lst)
[1, 1, 4, 8, 7, 1]

Appending a list to a list would create a sublist. If a nested list is not what is desired then the extend( ) function can be used.

lst.extend([10,11,12])
print(lst)
[1, 1, 4, 8, 7, 1, 10, 11, 12]

count( ) is used to count the number of a particular element that is present in the list.

lst.count(1)
3

index( ) is used to find the index value of a particular element. Note that if there are multiple elements of the same value then the first index value of that element is returned.

lst.index(1)
0

insert(x,y) is used to insert a element y at a specified index value x. Note that L.append(y) is equivalent to L.insert(len(L)+1,y) - that is insertion right at the end of the list L.

lst.insert(5, 'name')
print(lst)
[1, 1, 4, 8, 7, 'name', 1, 10, 11, 12]

insert(x,y) inserts but does not replace element. If you want to replace the element with another element you simply assign the value to that particular index.

lst[5] = 'Python'
print(lst)
[1, 1, 4, 8, 7, 'Python', 1, 10, 11, 12]

pop( ) function return the last element in the list. This is similar to the operation of a stack. Hence lists can be used as stacks by using append() for push and pop() to remove the most recently added element.

lst.pop()
12

Index value can be specified to pop a ceratin element corresponding to that index value.

lst.pop(0)
1

pop( ) is used to remove element based on it’s index value which can be assigned to a variable. One can also remove element by specifying the element itself using the remove( ) function.

lst.remove('Python')
print(lst)
[1, 4, 8, 7, 1, 10, 11]

Alternative to remove function but with using index value is del

del lst[1]
print(lst)
[1, 8, 7, 1, 10, 11]

The entire elements present in the list can be reversed by using the reverse() function.

lst.reverse()
print(lst)
[11, 10, 1, 7, 8, 1]

Note that the nested list [5,4,2,8] is treated as a single element of the parent list lst. Thus the elements inside the nested list is not reversed.

Python offers built in operation sort( ) to arrange the elements in ascending order. Alternatively sorted() can be used to construct a copy of the list in sorted order

lst.sort()
print(lst)
print(sorted([3,2,1])) # another way to sort
[1, 1, 7, 8, 10, 11]
[1, 2, 3]

For descending order an optional keyword argument reverse is provided. Setting this to True would arrange the elements in descending order.

print(sorted(lst,reverse=True)) 
print(lst) # remember that `sorted` creates a copy of the list in sorted order
[11, 10, 8, 7, 1, 1]
[1, 1, 7, 8, 10, 11]

Similarly for lists containing string elements, sort( ) would sort the elements based on it’s ASCII value in ascending and by specifying reverse=True in descending.

names.sort()
print(names)
names.sort(reverse=True)
print(names)
['Air', 'Earth', 'Fire', 'Water']
['Water', 'Fire', 'Earth', 'Air']

To sort based on length key=len should be specified as shown.

names.sort(key=len)
print(names)
print(sorted(names,key=len,reverse=True))
['Air', 'Fire', 'Water', 'Earth']
['Water', 'Earth', 'Fire', 'Air']

Copying a list

Assignment of a list does not imply copying. It simply creates a second reference to the same list. Most of new python programmers get caught out by this initially. Consider the following,

lista= [2,1,4,3]
listb = lista
print(listb)
[2, 1, 4, 3]

Here, We have declared a list, lista = [2,1,4,3]. This list is copied to listb by assigning its value. Now we perform some random operations on lista.

lista.sort()
lista.pop()
lista.append(9)
print("A =",lista)
print("B =",listb)
A = [1, 2, 3, 9]
B = [1, 2, 3, 9]

listb has also changed though no operation has been performed on it. This is because in Python assignment assigns references to the same object, rather than creating copies. So how do fix this?

If you recall, in slicing we had seen that parentlist[a:b] returns a list from parent list with start index a and end index b and if a and b is not mentioned then by default it considers the first and last element. We use the same concept here. By doing so, we are assigning the data of lista to listb as a variable.

lista = [2,1,4,3]
listb = lista[:] # make a copy by taking a slice from beginning to end
print("Starting with:")
print("A =",lista)
print("B =",listb)
lista.sort()
lista.pop()
lista.append(9)
print("Finnished with:")
print("A =",lista)
print("B =",listb)
Starting with:
A = [2, 1, 4, 3]
B = [2, 1, 4, 3]
Finnished with:
A = [1, 2, 3, 9]
B = [2, 1, 4, 3]

List comprehension

A very powerful concept in Python (that also applies to Tuples, sets and dictionaries as we will see below), is the ability to define lists using list comprehension (looping) expression. For example:

[i**2 for i in [1,2,3]]
[1, 4, 9]

In general this takes the form of [ <expression> for <variable> in <List> ]. That is a new list is constructed by taking each element of the given List is turn, assigning it to the variable and then evaluating the expression with this variable assignment.

As can be seen this constructs a new list by taking each element of the original [1,2,3] and squaring it. We can have multiple such implied loops to get for example:

[10*i+j for i in [1,2,3] for j in [5,7]]
[15, 17, 25, 27, 35, 37]

Finally the looping can be filtered using an if expression with the for - in construct.

[10*i+j for i in [1,2,3] if i%2==1 for j in [4,5,7] if j >= i+4] # keep odd i and  j larger than i+3 only
[15, 17, 37]

Tuples

Tuples are similar to lists but only big difference is the elements inside a list can be changed but in tuple it cannot be changed. Tuples are the natural extension of ordered pairs, triplets etc in mathematics. To show how this works consider the following code working with cartesian coordinates in the plane:

origin = (0.0,0.0,0.0)
x = origin
# x[1] = 1 # can't do something like this as it would change the origin
x = (1, 0, 0) # perfectly OK
print(x)
print(type(x))
(1, 0, 0)
<class 'tuple'>

To define a tuple, a variable is assigned to paranthesis ( ) or tuple( ).

tup = () # empty, zero-length tuple
tup2 = tuple()

If you want to directly declare a tuple of length 1 it can be done by using a comma at the end of the data.

27,
(27,)

27 when multiplied by 2 yields 54, But when multiplied with a tuple the data is repeated twice.

2*(27,)
(27, 27)

Values can be assigned while declaring a tuple. It takes a list as input and converts it into a tuple or it takes a string and converts it into a tuple.

tup3 = tuple([1,2,3])
print(tup3)
tup4 = tuple('Hello')
print(tup4)
(1, 2, 3)
('H', 'e', 'l', 'l', 'o')

It follows the same indexing and slicing as Lists.

print(tup3[1])
tup5 = tup4[:3]
print(tup5)
2
('H', 'e', 'l')

Mapping one tuple to another

Tupples can be used as the left hand side of assignments and are matched to the correct right hand side elements - assuming they have the right length

(a,b,c)= ('alpha','beta','gamma') # are optional
a,b,c= 'alpha','beta','gamma' # The same as the above
print(a,b,c)
a,b,c = ['Alpha','Beta','Gamma'] # can assign lists
print(a,b,c)
[a,b,c]=('this','is','ok') # even this is OK
print(a,b,c)
alpha beta gamma
Alpha Beta Gamma
this is ok

More complex nexted unpackings of values are also possible

(w,(x,y),z)=(1,(2,3),4)
print(w,x,y,z)
(w,xy,z)=(1,(2,3),4)
print(w,xy,z) # notice that xy is now a tuple
1 2 3 4
1 (2, 3) 4

Built In Tuple functions

count() function counts the number of specified element that is present in the tuple.

d=tuple('a string with many "a"s')
d.count('a')
3

index() function returns the index of the specified element. If the elements are more than one then the index of the first element of that specified element is returned

d.index('a')
0

Note that many of the other list functions such as min(), max(), sum() and sorted(), as well as the operator in, also work for tuples in the expected way.

Sets

Sets are mainly used to eliminate repeated numbers in a sequence/list. It is also used to perform some standard set operations.

Sets are declared as set() which will initialize a empty set. Also set([sequence]) can be executed to declare a set with elements. Note that unlike lists, the elements of a set are not in a sequence and cannot be accessed by an index.

set1 = set()
print(type(set1))
<class 'set'>
set0 = set([1,2,2,3,3,4])
set0 = {3,3,4,1,2,2} # equivalent to the above
print(set0) # order is not preserved
{1, 2, 3, 4}

elements 2,3 which are repeated twice are seen only once. Thus in a set each element is distinct.

However be warned that {} is NOT a set, but a dictionary (see next chapter of this tutorial)

type({})
dict

Built-in Functions

set1 = set([1,2,3])
set2 = set([2,3,4,5])

union( ) function returns a set which contains all the elements of both the sets without repition.

set1.union(set2)
{1, 2, 3, 4, 5}

add( ) will add a particular element into the set. Note that the index of the newly added element is arbitrary and can be placed anywhere not neccessarily in the end.

set1.add(0)
set1
{0, 1, 2, 3}

intersection( ) function outputs a set which contains all the elements that are in both sets.

set1.intersection(set2)
{2, 3}

difference( ) function ouptuts a set which contains elements that are in set1 and not in set2.

set1.difference(set2)
{0, 1}

symmetric_difference( ) function computes the set of elements that are in exactly one of the two given sets.

set2.symmetric_difference(set1)
{0, 1, 4, 5}

issubset( ), isdisjoint( ), issuperset( ) are used to check if the set1 is a subset, disjoint or superset of set2respectively.

print( set1.issubset(set2) )
print( set1.isdisjoint(set2) )
print( set1.issuperset(set2) )
False
False
False

pop( ) is used to remove an arbitrary element in the set

set1.pop()
print(set1)
{1, 2, 3}

remove( ) function deletes the specified element from the set.

set1.remove(2)
set1
{1, 3}

clear( ) is used to clear all the elements and make that set an empty set.

set1.clear()
set1
set()

Empty means false

In python an empty data structure is always equivalent to False

 "" and not set() and not [] and not {}
''
"" or [] # returns the last "False" value
[]
{1,2} or ""
{1, 2}