Jupyter Snippet P4M 02
Jupyter Snippet P4M 02
All of these python notebooks are available at https://gitlab.erc.monash.edu.au/andrease/Python4Maths.git
Working with strings
Recall from the previous section that strings can be entered with single, double or triple quotes:
'All', "of", '''these''', """are valid strings"""
Unicode: Python supports unicode strings - however for the most part this will be ignored in here. If you are workign in an editor that supports unicode you can use non-ASCII characters in strings (or even for variable names). Alternatively typing something like
"\u00B3" will give you the string “³” (superscript-3).
The Print Statement
As seen previously, The
print() function prints all of its arguments as strings, separated by spaces and follows by a linebreak:
- print("Hello World") - print("Hello",'World') - print("Hello", <Variable>)
The print has some optional arguments to control where and how to print. This includes
sep the separator (default space) and
end (end charcter) and
file to write to a file. When writing to a file, setting the argument
flush=True may be useful to force the function to write the output immediately. Without this Python may buffer the output which helps to improve the speed for repeated calls to print(), but isn’t helpful if you are, for example, wanting to see the output immediately during debugging)
There are lots of methods for formating and manipulating strings built into python. Some of these are illustrated here.
String concatenation is the “addition” of two strings. Observe that while concatenating there will be no space between the strings.
string1='World' string2='!' print('Hello' + " " + string1 + string2)
% operator is used to format a string inserting the value that comes after. It relies on the string containing a format specifier that identifies where to insert the value. The most common types of format specifiers are:
- %s -> string - %d -> Integer - %f -> Float - %o -> Octal - %x -> Hexadecimal - %e -> exponential
These will be very familiar to anyone who has ever written a C or Java program and follow nearly exactly the same rules as the
print("Hello %s" % string1) print("Actual Number = %d" %18) print("Float of the number = %f" %18) print("Octal equivalent of the number = %o" %18) print("Hexadecimal equivalent of the number = %x" %18) print("Exponential equivalent of the number = %e" %18)
Hello World Actual Number = 18 Float of the number = 18.000000 Octal equivalent of the number = 22 Hexadecimal equivalent of the number = 12 Exponential equivalent of the number = 1.800000e+01
When referring to multiple variables parentheses is used. Values are inserted in the order they appear in the parantheses (more on tuples in the next section)
print("Hello %s %s. This meaning of life is %d" %(string1,string2,42))
Hello World !. This meaning of life is 42
We can also specify the width of the field and the number of decimal places to be used. For example:
print('Print width 10: |%10s|'%'x') print('Print width 10: |%-10s|'%'x') # left justified print("The number pi = %.2f to 2 decimal places"%3.1415) print("More space pi = %10.2f"%3.1415) print("Pad pi with 0 = %010.2f"%3.1415) # pad with zeros
Print width 10: | x| Print width 10: |x | The number pi = 3.14 to 2 decimal places More space pi = 3.14 Pad pi with 0 = 0000003.14
Other String Methods
Multiplying a string by an integer simply repeats it
print("Hello World! "*5)
Hello World! Hello World! Hello World! Hello World! Hello World!
Strings can be tranformed by a variety of functions that are all methods on a string. That is they are called by putting the function name with a
. after the string. They include:
- Upper vs lower case:
swapcase()with mostly the obvious meaning. Note that
capitalizemakes the first letter of the string a capital only, while
titleselects upper case for the first letter of every word.
- Padding strings:
rjust(n)each place the string into a longer string of length n padded by spaces (centered, left-justified or right-justified respectively).
zfill(n)works similarly but pads with leading zeros.
- Stripping strings: Often we want to remove spaces, this is achived with the functions
rstrip()respectively to remove from spaces from the both end, just left or just the right respectively. An optional argument can be used to list a set of other characters to be removed.
s="heLLo wORLd!" print(s.capitalize(),"vs",s.title()) print("upper: '%s'"%s.upper(),"lower: '%s'"%s.lower(),"and swapped: '%s'"%s.swapcase()) print('|%s|' % "Hello World".center(30)) # center in 30 characters print('|%s|'% " lots of space ".strip()) # remove leading and trailing whitespace print('%s without leading/trailing d,h,L or ! = |%s|',s.strip("dhL!")) print("Hello World".replace("World","Class"))
Hello world! vs Hello World! upper: 'HELLO WORLD!' lower: 'hello world!' and swapped: 'HEllO WorlD!' | Hello World | |lots of space| %s without leading/trailing d,h,L or ! = |%s| eLLo wOR Hello Class
There are also lost of ways to inspect or check strings. Examples of a few of these are given here:
- Checking the start or end of a string:
endswith("string")checks if it starts/ends with the string given as argument
- Capitalisation: There are boolean counterparts for all forms of capitalisation, such as
- Character type: does the string only contain the characters
isdecimal(). Note there is also
isdigit()which are effectively the same function except for certain unicode characters
isalpha()or combined with digits:
- non-control code:
isprintable()accepts anything except ‘\n’ an other ASCII control codes
- \t\n \r (white space characters):
- Suitable as variable name:
- Find elements of string:
s.count(w)finds the number of times w occurs in s, while
s.rfind(w)find the first and last position of the string w in s.
s="Hello World" print("The length of '%s' is"%s,len(s),"characters") # len() gives length s.startswith("Hello") and s.endswith("World") # check start/end # count strings print("There are %d 'l's but only %d World in %s" % (s.count('l'),s.count('World'),s)) print('"el" is at index',s.find('el'),"in",s) #index from 0 or -1
The length of 'Hello World' is 11 characters There are 3 'l's but only 1 World in Hello World "el" is at index 1 in Hello World
String comparison operations
Strings can be compared in lexicographical order with the usual comparisons. In addition the
in operator checks for substrings:
'abc' < 'bbc' <= 'bbc'
"ABC" in "This is the ABC of Python"
Accessing parts of strings
Strings can be indexed with square brackets. Indexing starts from zero in Python. And the
len() function provides the length of a string
s = '123456789' print("The string '%s' string is %d characters long" % (s, len(s)) ) print('First character of',s,'is',s) print('Last character of',s,'is',s[len(s)-1])
The string '123456789' string is 9 characters long First character of 123456789 is 1 Last character of 123456789 is 9
Negative indices can be used to start counting from the back
print('First character of',s,'is',s[-len(s)]) print('Last character of',s,'is',s[-1])
First character of 123456789 is 1 Last character of 123456789 is 9
Finally a substring (range of characters) an be specified as using $a:b$ to specify the characters at index $a,a+1,\ldots,b-1$. Note that the last charcter is not included.
print("First three characters",s[0:3]) print("Next three characters",s[3:6])
First three characters 123 Next three characters 456
An empty beginning and end of the range denotes the beginning/end of the string:
print("First three characters", s[:3]) print("Last three characters", s[-3:])
First three characters 123 Last three characters 789
Breaking appart strings
When processing text, the ability to split strings appart is particularly useful.
partition(separator): breaks a string into three parts based on a separator
split(): breaks string into words separated by white-space (optionally takes a separator as argument)
join(): joins the result of a split using string as separator
s = "one -> two -> three" print( s.partition("->") ) print( s.split() ) print( s.split(" -> ") ) print( ";".join( s.split(" -> ") ) )
('one ', '->', ' two -> three') ['one', '->', 'two', '->', 'three'] ['one', 'two ', ' three'] one;two ; three
Strings are immutable
It is important that strings are constant, immutable values in Python. While new strings can easily be created it is not possible to modify a string:
s='012345' sX=s[:2]+'X'+s[3:] # this creates a new string with 2 replaced by X print("creating new string",sX,"OK") sX=s.replace('2','X') # the same thing print(sX,"still OK") s = 'X' # an error!!!
creating new string 01X345 OK 01X345 still OK --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-17-93bf77b20e7d> in <module> 4 sX=s.replace('2','X') # the same thing 5 print(sX,"still OK") ----> 6 s = 'X' # an error!!! TypeError: 'str' object does not support item assignment
Advanced string processing
For more advanced string processing there are many libraries available in Python including for example:
- re for regular expression based searching and splitting of strings
- html for manipulating HTML format text
- textwrap for reformatting ASCII text
- … and many more