Jupyter Snippet P4M 04

Jupyter Snippet P4M 04

All of these python notebooks are available at https://gitlab.erc.monash.edu.au/andrease/Python4Maths.git

Strings

Strings have already been discussed in Chapter 02, but can also be treated as collections similar to lists and tuples. For example

S = 'The Taj Mahal is beautiful'
print([x for x in S if x.islower()]) # list of lower case charactes
words=S.split() # list of words
print("Words are:",words)
print("--".join(words)) # hyphenated 
" ".join(w.capitalize() for w in words) # capitalise words
['h', 'e', 'a', 'j', 'a', 'h', 'a', 'l', 'i', 's', 'b', 'e', 'a', 'u', 't', 'i', 'f', 'u', 'l']
Words are: ['The', 'Taj', 'Mahal', 'is', 'beautiful']
The--Taj--Mahal--is--beautiful





'The Taj Mahal Is Beautiful'

String Indexing and Slicing are similar to Lists which was explained in detail earlier.

print(S[4])
print(S[4:])
T
Taj Mahal is beautiful

Dictionaries

Dictionaries are mappings between keys and items stored in the dictionaries. Alternatively one can think of dictionaries as sets in which something stored against every element of the set. They can be defined as follows:

To define a dictionary, equate a variable to { } or dict()

d = dict() # or equivalently d={}
print(type(d))
d['abc'] = 3
d[4] = "A string"
print(d)
<class 'dict'>
{'abc': 3, 4: 'A string'}

As can be guessed from the output above. Dictionaries can be defined by using the { key : value } syntax. The following dictionary has three elements

d = { 1: 'One', 2 : 'Two', 100 : 'Hundred'}
len(d)
3

Now you are able to access ‘One’ by the index value set at 1

print(d[1])
One

There are a number of alternative ways for specifying a dictionary including as a list of (key,value) tuples. To illustrate this we will start with two lists and form a set of tuples from them using the zip() function Two lists which are related can be merged to form a dictionary.

names = ['One', 'Two', 'Three', 'Four', 'Five']
numbers = [1, 2, 3, 4, 5]
[ (name,number) for name,number in zip(names,numbers)] # create (name,number) pairs
[('One', 1), ('Two', 2), ('Three', 3), ('Four', 4), ('Five', 5)]

Now we can create a dictionary that maps the name to the number as follows.

a1 = dict((name,number) for name,number in zip(names,numbers))
print(a1)
{'One': 1, 'Two': 2, 'Three': 3, 'Four': 4, 'Five': 5}

Note that the ordering for this dictionary is not based on the order in which elements are added but on its own ordering (based on hash index ordering). It is best never to assume an ordering when iterating over elements of a dictionary.

Note: Any value used as a key must be immutable. That means that tuples can be used as keys (because they can’t be changed) but lists are not allowed. As an aside for more advanced readers, arbitrary objects can be used as keys – but in this case the object reference (address) is used as a key, not the “value” of the object.

The use of tuples as keys is very common and allows for a (sparse) matrix type data structure:

matrix={ (0,1): 3.5, (2,17): 0.1}
matrix[2,2] = matrix[0,1] + matrix[2,17]
# matrix[2,2] is equivalent to matrix[ (2,2) ]
print(matrix)
{(0, 1): 3.5, (2, 17): 0.1, (2, 2): 3.6}

Dictionary can also be built using the loop style definition.

a2 = { name : len(name) for name in names}
print(a2)
{'One': 3, 'Two': 3, 'Three': 5, 'Four': 4, 'Five': 4}

Built-in Functions

The len() function and in operator have the obvious meaning:

print("a1 has",len(a1),"elements")
print("One is in a1",'One' in a1,"but not 2:", 2 in a1) # 'in' checks keys only
a1 has 5 elements
One is in a1 True but not 2: False

The clear( ) function is used to erase all elements.

a2.clear()
print(a2)
{}

The values( ) function returns a list with all the assigned values in the dictionary. (Acutally not quit a list, but something that we can iterate over just like a list to construct a list, tuple or any other collection):

[ v for v in a1.values() ]
[1, 2, 3, 4, 5]

keys( ) function returns all the index or the keys to which contains the values that it was assigned to.

{ k for k in a1.keys() }
{'Five', 'Four', 'One', 'Three', 'Two'}

items( ) is returns a list containing both the list but each element in the dictionary is inside a tuple. This is same as the result that was obtained when zip function was used - except that the ordering may be ‘shuffled’ by the dictionary.

",  ".join( "%s = %d" % (name,val) for name,val in a1.items())
'One = 1,  Two = 2,  Three = 3,  Four = 4,  Five = 5'

The pop( ) function is used to get the remove that particular element and this removed element can be assigned to a new variable. But remember only the value is stored and not the key. Because the is just a index value.

val = a1.pop('Four')
print(a1)
print("Removed",val)
{'One': 1, 'Two': 2, 'Three': 3, 'Five': 5}
Removed 4

When to use Dictionaries vs Lists

The choice of whether to store data in a list or dictionary (or set) may seem a bit arbitrary at times. Here is a brief summary of some of the pros and cons of these:

  • Finding elements in a set vs a list: x in C is valid whether the collection C is a list, set or dictonary. However computationally for large collections this is much slower with lists than sets or dictionaries. On the other hand if all items are indexed by an integer than x[45672] is much faster to look up if x is a list than if it is a dictionary.
  • If all your items are indexed by integers but with some indices unused you could use lists and assign some dummy value (e.g. “") whenever there is no corresponding item. For very sparse collections this could consume significant additional memory compared to a dictionary. On the other hand if most values are present, then storing the indices explicitly (as is done in a dictionary) could consume significant additional memory compared to the list representation.
import time
bigList = [i for i in range(0,100000)]
bigSet = set(bigList)
start = time.clock()  # how long to find the last number out of 10,000 items?
99999 in bigList
print("List lookup time: %.6f ms" % (1000*(time.clock()-start)))
start = time.clock()
99999 in bigSet
print("Set lookup time:  %.6f ms" % (1000*(time.clock()-start)))
List lookup time: 1.031000 ms
Set lookup time:  0.043000 ms