Jupyter Snippet P4M 02

Jupyter Snippet P4M 02

All of these python notebooks are available at https://gitlab.erc.monash.edu.au/andrease/Python4Maths.git

Working with strings

Recall from the previous section that strings can be entered with single, double or triple quotes:

  'All', "of", '''these''', """are
  valid strings"""

Unicode: Python supports unicode strings - however for the most part this will be ignored in here. If you are workign in an editor that supports unicode you can use non-ASCII characters in strings (or even for variable names). Alternatively typing something like "\u00B3" will give you the string “³” (superscript-3).

The Print Statement

As seen previously, The print() function prints all of its arguments as strings, separated by spaces and follows by a linebreak:

- print("Hello World")
- print("Hello",'World')
- print("Hello", <Variable>)

Note that print is different in old versions of Python (2.7) where it was a statement and did not need parentheses around its arguments.

print("Hello","World")
Hello World

The print has some optional arguments to control where and how to print. This includes sep the separator (default space) and end (end charcter) and file to write to a file. When writing to a file, setting the argument flush=True may be useful to force the function to write the output immediately. Without this Python may buffer the output which helps to improve the speed for repeated calls to print(), but isn’t helpful if you are, for example, wanting to see the output immediately during debugging)

print("Hello","World",sep='...',end='!!',flush=True)
Hello...World!!

String Formating

There are lots of methods for formating and manipulating strings built into python. Some of these are illustrated here.

String concatenation is the “addition” of two strings. Observe that while concatenating there will be no space between the strings.

string1='World'
string2='!'
print('Hello' + " " + string1 + string2)
Hello World!

The % operator is used to format a string inserting the value that comes after. It relies on the string containing a format specifier that identifies where to insert the value. The most common types of format specifiers are:

- %s -> string
- %d -> Integer
- %f -> Float
- %o -> Octal
- %x -> Hexadecimal
- %e -> exponential

These will be very familiar to anyone who has ever written a C or Java program and follow nearly exactly the same rules as the printf() function.

print("Hello %s" % string1)
print("Actual Number = %d" %18)
print("Float of the number = %f" %18)
print("Octal equivalent of the number = %o" %18)
print("Hexadecimal equivalent of the number = %x" %18)
print("Exponential equivalent of the number = %e" %18)
Hello World
Actual Number = 18
Float of the number = 18.000000
Octal equivalent of the number = 22
Hexadecimal equivalent of the number = 12
Exponential equivalent of the number = 1.800000e+01

When referring to multiple variables parentheses is used. Values are inserted in the order they appear in the parantheses (more on tuples in the next section)

print("Hello %s %s. This meaning of life is %d" %(string1,string2,42))
Hello World !. This meaning of life is 42

We can also specify the width of the field and the number of decimal places to be used. For example:

print('Print width 10: |%10s|'%'x')
print('Print width 10: |%-10s|'%'x') # left justified
print("The number pi = %.2f to 2 decimal places"%3.1415)
print("More space pi = %10.2f"%3.1415)
print("Pad pi with 0 = %010.2f"%3.1415) # pad with zeros
Print width 10: |         x|
Print width 10: |x         |
The number pi = 3.14 to 2 decimal places
More space pi =       3.14
Pad pi with 0 = 0000003.14

Other String Methods

Multiplying a string by an integer simply repeats it

print("Hello World! "*5)
Hello World! Hello World! Hello World! Hello World! Hello World! 

Formatting

Strings can be tranformed by a variety of functions that are all methods on a string. That is they are called by putting the function name with a . after the string. They include:

  • Upper vs lower case: upper(), lower(), captialize(), title() and swapcase() with mostly the obvious meaning. Note that capitalize makes the first letter of the string a capital only, while title selects upper case for the first letter of every word.
  • Padding strings: center(n), ljust(n) and rjust(n) each place the string into a longer string of length n padded by spaces (centered, left-justified or right-justified respectively). zfill(n) works similarly but pads with leading zeros.
  • Stripping strings: Often we want to remove spaces, this is achived with the functions strip(), lstrip(), and rstrip() respectively to remove from spaces from the both end, just left or just the right respectively. An optional argument can be used to list a set of other characters to be removed.
s="heLLo wORLd!"
print(s.capitalize(),"vs",s.title())
print("upper: '%s'"%s.upper(),"lower: '%s'"%s.lower(),"and swapped: '%s'"%s.swapcase())
print('|%s|' % "Hello World".center(30)) # center in 30 characters
print('|%s|'% "     lots of space             ".strip()) # remove leading and trailing whitespace
print('%s without leading/trailing d,h,L or ! = |%s|',s.strip("dhL!"))
print("Hello World".replace("World","Class"))
Hello world! vs Hello World!
upper: 'HELLO WORLD!' lower: 'hello world!' and swapped: 'HEllO WorlD!'
|         Hello World          |
|lots of space|
%s without leading/trailing d,h,L or ! = |%s| eLLo wOR
Hello Class

Inspecting Strings

There are also lost of ways to inspect or check strings. Examples of a few of these are given here:

  • Checking the start or end of a string: startswith("string") and endswith("string") checks if it starts/ends with the string given as argument
  • Capitalisation: There are boolean counterparts for all forms of capitalisation, such as isupper(), islower() and istitle()
  • Character type: does the string only contain the characters
    • 0-9: isdecimal(). Note there is also isnumeric() and isdigit() which are effectively the same function except for certain unicode characters
    • a-zA-Z: isalpha() or combined with digits: isalnum()
    • non-control code: isprintable() accepts anything except ‘\n’ an other ASCII control codes
    • \t\n \r (white space characters): isspace()
    • Suitable as variable name: isidentifier()
  • Find elements of string: s.count(w) finds the number of times w occurs in s, while s.find(w) and s.rfind(w) find the first and last position of the string w in s.
s="Hello World"
print("The length of '%s' is"%s,len(s),"characters") # len() gives length
s.startswith("Hello") and s.endswith("World") # check start/end
# count strings
print("There are %d 'l's but only %d World in %s" % (s.count('l'),s.count('World'),s))
print('"el" is at index',s.find('el'),"in",s) #index from 0 or -1
The length of 'Hello World' is 11 characters
There are 3 'l's but only 1 World in Hello World
"el" is at index 1 in Hello World

String comparison operations

Strings can be compared in lexicographical order with the usual comparisons. In addition the in operator checks for substrings:

'abc' < 'bbc' <= 'bbc'
True
"ABC" in "This is the ABC of Python"
True

Accessing parts of strings

Strings can be indexed with square brackets. Indexing starts from zero in Python. And the len() function provides the length of a string

s = '123456789'
print("The string '%s' string is %d characters long" % (s, len(s)) )
print('First character of',s,'is',s[0])
print('Last character of',s,'is',s[len(s)-1])
The string '123456789' string is 9 characters long
First character of 123456789 is 1
Last character of 123456789 is 9

Negative indices can be used to start counting from the back

print('First character of',s,'is',s[-len(s)])
print('Last character of',s,'is',s[-1])
First character of 123456789 is 1
Last character of 123456789 is 9

Finally a substring (range of characters) an be specified as using $a:b$ to specify the characters at index $a,a+1,\ldots,b-1$. Note that the last charcter is not included.

print("First three characters",s[0:3])
print("Next three characters",s[3:6])
First three characters 123
Next three characters 456

An empty beginning and end of the range denotes the beginning/end of the string:

print("First three characters", s[:3])
print("Last three characters", s[-3:])
First three characters 123
Last three characters 789

Breaking appart strings

When processing text, the ability to split strings appart is particularly useful.

  • partition(separator): breaks a string into three parts based on a separator
  • split(): breaks string into words separated by white-space (optionally takes a separator as argument)
  • join(): joins the result of a split using string as separator
s = "one -> two  ->  three"
print( s.partition("->") )
print( s.split() )
print( s.split(" -> ") )
print( ";".join( s.split(" -> ") ) )
('one ', '->', ' two  ->  three')
['one', '->', 'two', '->', 'three']
['one', 'two ', ' three']
one;two ; three

Strings are immutable

It is important that strings are constant, immutable values in Python. While new strings can easily be created it is not possible to modify a string:

s='012345'
sX=s[:2]+'X'+s[3:] # this creates a new string with 2 replaced by X
print("creating new string",sX,"OK")
sX=s.replace('2','X') # the same thing
print(sX,"still OK")
s[2] = 'X' # an error!!!
creating new string 01X345 OK
01X345 still OK



---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-17-93bf77b20e7d> in <module>
      4 sX=s.replace('2','X') # the same thing
      5 print(sX,"still OK")
----> 6 s[2] = 'X' # an error!!!


TypeError: 'str' object does not support item assignment

Advanced string processing

For more advanced string processing there are many libraries available in Python including for example:

  • re for regular expression based searching and splitting of strings
  • html for manipulating HTML format text
  • textwrap for reformatting ASCII text
  • … and many more