Strings 2 - operators
Download exercises zip
Python offers several operators to work with strings:
Operator |
Syntax |
Result |
Meaning |
---|---|---|---|
len |
|
int |
Returns the length of the string |
str |
str |
Reads the character at the specified index |
|
str |
str |
Concatenate two strings |
|
str |
bool |
Checks whether a string is contained inside another one |
|
str |
str |
Extracts a sub-string |
|
|
|
Checks whether strings are equal or different |
|
|
|
Performs lexicographic comparison |
|
ord |
|
|
Returns the order of a character |
chr |
|
|
Given an order, returns the corresponding character |
str |
str |
Replicate the string |
What to do
Unzip exercises zip in a folder, you should obtain something like this:
strings
strings1.ipynb
strings1-sol.ipynb
strings2.ipynb
strings2-sol.ipynb
strings3.ipynb
strings3-sol.ipynb
strings4.ipynb
strings4-sol.ipynb
strings5-chal.ipynb
jupman.py
WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook
strings2.ipynb
Go on reading the exercises file, sometimes you will find paragraphs marked EXERCISE which will ask to write Python commands in the following cells.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
Reading characters
A string is a sequence of characters, and often we might want to access a single character by specifying the position of the character we are interested in.
It’s important to remember that the position of characters in strings start from 0
. For reading a character in a certain position, we need to write the string followed by square parenthesis and spcify the position inside. Examples:
[2]:
'park'[0]
[2]:
'p'
[3]:
'park'[1]
[3]:
'a'
[4]:
#0123
'park'[2]
[4]:
'r'
[5]:
#0123
'park'[3]
[5]:
'k'
If we try to go beyond the last character, we will get an error:
#0123
'park'[4]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-106-b8f1f689f0c7> in <module>
1 #0123
----> 2 'park'[4]
IndexError: string index out of range
Before we used a string by specifying it as a literal, but we can also use variables:
[6]:
#01234
x = 'cloud'
[7]:
x[0]
[7]:
'c'
[8]:
x[2]
[8]:
'o'
How is represented the character we’ve just read? If you noticed, it is between quotes like if it were a string. Let’s check:
[9]:
type(x[0])
[9]:
str
It’s really a string. To somebody this might come as a surprise, also from a philosophical standpoint: Python strings are made of… strings! Other programming languages may use a specific type for the single character, but Python uses strings to be able to better manage complex alphabets as, for example, japanese.
QUESTION: Let’s suppose x
is any string. If we try to execute this code:
x[0]
we will get:
always a character
always an error
sometimes a character, sometimes an error according to the string
QUESTION: Let’s suppose x
is an empty string. If we try to execute this code:
x[len(x)]
we will get:
always a character
always an error
sometimes a character, sometimes an error according to the string at hand
Exercise - alternate
Given two strings both of length 3
, print a string which alternates characters from both strings. You code must work with any string of this length
Example - given:
x="say"
y="hi!"
it should print:
shaiy!
[10]:
# write here
shaiy!
Negative indexes
In Python we can also use negative indexes, which instead to start from the beginning they start from the end:
[11]:
#4321
"park"[-1]
[11]:
'k'
[12]:
#4321
"park"[-2]
[12]:
'r'
[13]:
#4321
"park"[-3]
[13]:
'a'
[14]:
#4321
"park"[-4]
[14]:
'p'
If we go one step beyond, we get an error:
#4321
"park"[-5]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-126-668d8a13a324> in <module>
----> 1 "park"[-5]
IndexError: string index out of range
QUESTION: Suppose x
is a NON-empty string. What do we get with the following expression?
x[-len(x)]
always a character
always an error
sometimes a character, sometime an error according to the string
QUESTION: Suppose x
is a some string (possibly empty), the expressions
x[len(x) - 1]
and
x[-len(x) - 1]
are equivalent ? What do they do ?
Show answerQUESTION: If x
is a non-empty string, what does the following expression produce? Can we simplify it to a shorter one?
(x + x)[len(x)]
QUESTION: If x
is a non-empty string, what does the following expression produce? An error? Something else? Can we simplify it?
'park'[0][0]
QUESTION: If x
is a non-empty string, what does the following expression produce? An error? Something else? Can we simplify it?
(x[0])[0]
Substitute characters
We said strings in Python are immutable. Suppose we have a string like this:
[15]:
#01234
x = 'port'
and, for example, we want to change the character at position 2
(in this case, the r
) into an s
. What do we do?
We might be tempted to write like the following, but Python would punish us with an error:
x[2] = 's'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-113-e5847c6fa4bf> in <module>
----> 1 x[2] = 's'
TypeError: 'str' object does not support item assignment
The correct solution is assigning a completely new string to x
, obtained by taking pieces from the previous one:
[16]:
x = x[0] + x[1] + 's' + x[3]
[17]:
x
[17]:
'post'
If seeing x
to the right of equal sign baffles you, we can decompose the code like this and it will work the same way:
[18]:
x = "port"
y = x
x = y[0] + y[1] + 's' + y[3]
Try it in Python Tutor:
[19]:
x = "port"
y = x
x = y[0] + y[1] + 's' + y[3]
jupman.pytut()
[19]:
Slices
We might want to read only a subsequence which starts from a position and ends up in another one. For example, suppose we have:
[20]:
#0123456789
x = 'mercantile'
and we want to extract the string 'canti'
, which starts at index 3 included. We might extract the single characters and concatenate them with +
sign, but we would write a lot of code. A better option is to use the so-called slices: simply write the string followed by square parenthesis containing only start index (included), a colon, and finally end index (excluded):
[21]:
#0123456789
x = 'mercantile'
x[3:8] # note the : inside start and end indexes
[21]:
'canti'
WARNING: Extracting with slices DOES NOT modify the original string !!
Let’s see an example:
[22]:
#0123456789
x = 'mercantile'
print(' x is', x)
print('The slice x[3:8] is', x[3:8])
print(' x is', x) # note x continues to point to old string!
x is mercantile
The slice x[3:8] is canti
x is mercantile
QUESTION: if x
is any string of length at least 5
, what does this code produce? An error? It works? Can we shorten it?
x[3:4]
Exercise - garalampog
Write some code to extract and print alam
from the string "garalampog"
. Try guessing the correct indexes.
[23]:
x = "garalampog"
# write here
alam
Exercise - ifEweEfav lkSD lkWe
Write some code to extract and print kD
from the string "ifE\te\nfav lkD lkWe"
. Be careful of spaces and special characters (before you might want to print x
). Try guessing correct indexes.
[24]:
x = "ifE\te\nfav lkD lkWe"
# write here
kD
Slices - limits
Whenever we use slice we must be careful with index limits. Let’s see how they behave:
[25]:
#012345
"chair"[0:3] # from index 0 *included* to 3 *excluded*
[25]:
'cha'
[26]:
#012345
"chair"[0:4] # from index 0 *included* to 4 *excluded*
[26]:
'chai'
[27]:
#012345
"chair"[0:5] # from index 0 *included* to 5 *excluded*
[27]:
'chair'
[28]:
#012345
"sedia"[0:6] # if we go beyond string length Python doesn't complain
[28]:
'sedia'
QUESTION: if x
is any string (also empty), what does this expression do? Can it give an error? Does it return something useful?
x[0:len(x)]
Slice - Omitting limits
If we want, it’s possible to omit the starting index, in this case Python will suppose it’s a 0
:
[29]:
#0123456789
"catamaran"[:3]
[29]:
'cat'
It’s also possible to omit the ending index, in that case Python will extract until the end of the string:
[30]:
#0123456789
"catamaran"[3:]
[30]:
'amaran'
By omitting both indexes we obtain the full string:
[31]:
"catamaran"[:]
[31]:
'catamaran'
Exercise - ysterymyster
Write some code that given a string x
prints the string composed with all the characters of x
except the first one, followed by all characters of x
except the last one.
your code must work with any string
Example 1 - given:
x = "mystery"
must print:
ysterymyster
Example 2 - given:
x = "rope"
must print:
operop
[32]:
x = "mystery"
#x = "rope"
# write here
Slice - negative limits
If we want, it’s also possible to set negative limits, although it’s not always intuitive:
[33]:
#0123456
"vegetal"[3:0] # from index 3 to positive indexes <= 3 doesn't produce anything
[33]:
''
[34]:
#0123456
"vegetal"[3:1] # from index 3 to positive indexes <= 3 doesn't produce anything
[34]:
''
[35]:
#0123456
"vegetal"[3:2] # from index 3 to positive indexes <= 3 doesn't produce anything
[35]:
''
[36]:
#0123456
"vegetal"[3:3] # from index 3 to positive indexes <= 3 doesn't produce anything
[36]:
''
Let’s see what happens with negative indexes:
[37]:
#0123456 positive indexes
#7654321 negative indexes
"vegetal"[3:-1]
[37]:
'eta'
[38]:
#0123456 positive indexes
#7654321 negative indexes
"vegetal"[3:-2]
[38]:
'et'
[39]:
#0123456 positive indexes
#7654321 negative indexes
"vegetal"[3:-3]
[39]:
'e'
[40]:
#0123456 positive indexes
#7654321 negative indexes
"vegetal"[3:-4]
[40]:
''
[41]:
#0123456 positive indexes
#7654321 negative indexes
"vegetal"[3:-5]
[41]:
''
Exercise - javarnanda
Given a string x
, write some code to extract and print its last 3 characters joined to the to first 3.
Your code should work for any string of length equal or greater than 3
Example 1 - given:
x = "javarnanda"
it should print:
javnda
Example 2 - given:
x = "bang"
it should print:
banang
[42]:
x = "javarnanda"
#x = "bang"
# write here
javnda
Slice - modifying
Suppose to have the string
[43]:
#0123456789
s = "the table is placed in the center of the room"
and we want to change s
assignment so it becomes associated to the string:
#0123456789
"the chair is placed in the center of the room"
Since both strings are similar, we might be tempted to only redefine the character sequence which corresponds to the word "table"
, which goes from index 4
included to index 9
excluded:
s[4:9] = "chair" # WARNING! WRONG!
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-57-0de7363c6882> in <module>
----> 1 s[4:9] = "chair" # WARNING! WRONG!
TypeError: 'str' object does not support item assignment
Sadly, we would receive an error, because as repeated many times strings are IMMUTABLE, so we cannot select a chunk of a particular string and try to change the original string. What we can do instead is to build a NEW string from pieces of the original string, concatenates the desired characters and associates the result to the variabile of which we want to modify the assignment:
[44]:
#0123456789
s = "the table is placed in the center of the room"
s = s[0:4] + "chair" + s[9:]
print(s)
the chair is placed in the center of the room
When Python finds the line
s = s[0:4] + "chair" + s[9:]
FIRST it calculates the result on the right of the =
, and THEN associates the result to the variable on the left. In the expression on the right only NEW strings are generated, which once built can be assigned to variable s
Exercise - the run
Write some code such that when given the string s
s = 'The Gold Rush has begun.'
and some variables
what = 'Atom'
happened = 'is over'
substitues the substring 'Gold'
with the string in the variable what
and substitues the substring 'has begun'
with the string in the variable happened
.
After exectuing your code, the string associated to s
should be
>>> print(s)
"The Atom Rush is over."
DON’T use constant characters in your code, i.e. dots
'.'
aren’t allowed !
[45]:
#01234567890123456789012345678
s = 'The Gold Rush has begun.'
what = 'Atom'
happened = 'is over'
# write here
The Atom Rush is over.
Inclusion operator
To check if a string is included in another one, we use the the in
operator.
Note the result of this expression is a boolean:
[46]:
'the' in 'Singing in the rain'
[46]:
True
[47]:
'si' in 'Singing in the rain' # in operator is case-sensitive
[47]:
False
[48]:
'Si' in 'Singing in the rain'
[48]:
True
Do not abuse in
WARNING: in
is often used in a wrong / inefficient way
Always ask yourself:
Could the string not contain the substring we’re looking for? Always remember to handle also this case!
in
performs a search on all the string, which might be inefficient: is it really necessary, or do we already know the interval where to search?if we want to know whether
character
is in a position we know a priori (i.e.3
),in
is not needed, it’s enough to writemy_string[3] == character
. By usingin
Python might find duplicated characters which are before or after the one we want to verify!
Exercise - contained 1
You are given two strings x
and y
, and a third z
. Write some code which prints True
if x
and y
are both contained in z
.
Example 1 - given:
x = 'cad'
y = 'ra'
z = 'abracadabra'
it should print:
True
Example 2 - given:
x = 'zam'
y = 'ra'
z = 'abracadabra'
it should print:
False
[49]:
x,y,z = 'cad','ra','abracadabra' # True
#x,y,z = 'zam','ra','abracadabra' # False
# write here
Exercise - contained 2
Given three strings x
, y
, z
, write some code which prints True
if the string x
is contained in at least one of the strings y
or z
, otherwise prints False
your code should work with any set of strings
Example 1 - given:
x = "ope"
y = "honesty makes for long friendships"
z = "I hope it's clear enough"
it should print:
True
Example 2 - given:
x = "nope"
y = "honesty makes for long friendships"
z = "I hope it's clear enough"
it should print:
False
Example 3 - given:
x = "cle"
y = "honesty makes for long friendships"
z = "I hope it's clear enough"
it should show:
True
[50]:
x,y,z = "ope","honesty makes for long friendships","I hope it's clear enough" # True
#x,y,z = "nope","honesty makes for long friendships","I hope it's clear enough" # False
#x,y,z = "cle","honesty makes for long friendships","I hope it's clear enough" # True
# write here
Comparisons
Python offers us the possibility to perform a lexicographic comparison among strings, like we would when placing names in an address book. Although sorting names is something intuitive we often do, we must be careful about special cases.
First, let’s determine when two strings are equal.
Equality operators
To check whether two strings are equal, you can use te operator ==
which as result produces the boolean True
or False
WARNING: ==
is written with TWO equal signs !!!
[51]:
"dog" == "dog"
[51]:
True
[52]:
"dog" == "wolf"
[52]:
False
Equality operator is case-sensitive:
[53]:
"dog" == "DOG"
[53]:
False
To check whether two strings are NOT equal, we can use the operator !=
, which we can expect to behave exactly as the opposite of ==
:
[54]:
"dog" != "dog"
[54]:
False
[55]:
"dog" != "wolf"
[55]:
True
[56]:
"dog" != "DOG"
[56]:
True
As an alternative, we might use the operator not
:
[57]:
not "dog" == "dog"
[57]:
False
[58]:
not "wolf" == "dog"
[58]:
True
[59]:
not "dog" == "DOG"
[59]:
True
QUESTION: what does the following code print?
x = "river" == "river"
print(x)
QUESTION: for each of the following expressions, try to guess whether it produces True
or False
'hat' != 'Hat'
'hat' == 'HAT'
'choralism'[2:5] == 'contemporary'[7:10]
'AlAbAmA'[4:] == 'aLaBaMa'
'bright'[9:20] == 'dark'[10:15]
'optical'[-1] == 'crystal'[-1]
('hat' != 'jacket') == ('trousers' != 'bow')
('stra' in 'stradivarius') == ('div' in 'digital divide')
len('note') in '5436'
str(len('note')) in '5436'
len('posters') in '5436'
str(len('posters')) in '5436'
Exercise - statist
Write some code which prints True
if a word
begins with the same two characters it ends with.
Your code should work for any
word
[60]:
word = 'statist' # True
#word = 'baobab' # False
#word = 'maxima' # True
#word = 'karma' # False
# write here
Comparing characters
Characters have an inherent order we can exploit. Let’s see an example:
[61]:
'a' < 'g'
[61]:
True
another one:
[62]:
'm' > 'c'
[62]:
True
They sound reasonable comparisons! But what about this (notice capital 'Z'
)?
[63]:
'a' < 'Z'
[63]:
False
Maybe this doesn’t look so obvious. And what if we get creative and compare with symbols such as square bracket or Unicode hearts ??
[64]:
'a' > '♥'
[64]:
False
To determine how to deal with this special cases, we must remember ASCII assignes a position number to each character, defining as a matter of fact an ordering between all its characters.
If we want to know the corresponding number of a character, we can use the function ord
:
[65]:
ord('a')
[65]:
97
[66]:
ord('b')
[66]:
98
[67]:
ord('z')
[67]:
122
If we want to go the other way, given a position number we can obtain the corresponding character with chr
function:
[68]:
chr(97)
[68]:
'a'
Uppercase characters have different positions:
[69]:
ord('A')
[69]:
65
[70]:
ord('Z')
[70]:
90
EXERCISE: Using the functions above, try to find which characters are between capital Z
and lowercase a
[71]:
# write here
The ordering allows us to perform lexicographic comparisons between single characters:
[72]:
'a' < 'b'
[72]:
True
[73]:
'g' >= 'm'
[73]:
False
EXERCISE: Write some code that:
prints the
ord
values of'A'
,'Z'
and a givenchar
prints
True
ifchar
is uppercase, andFalse
otherwise
Would your code also work with accented capitalized characters such as
'Á'
?NOTE: the possibile character sets are way too many, so the proper solution would be to use the method isupper we will see in the next tutorial.
[74]:
char = 'G' # True
#char = 'g' # False
#char = 'Á' # True ?? Note the accent!
# write here
Also, since Unicode character set includes ASCII, the ordering of ASCII characters can be used to safely compare them against unicode characters, so comparing characters or their ord
should be always equivalent:
[75]:
ord('a') # ascii
[75]:
97
[76]:
ord('♥') # unicode
[76]:
9829
[77]:
'a' > '♥'
[77]:
False
[78]:
ord('a') > ord('♥')
[78]:
False
Python also offers lexicographic comparisons on strings with more than one character. To understand what the expected result should be, we must distinguish among several cases, though:
strings of equal / different length
strings with same / mixed case
Let’s begin with same length strings:
[79]:
'mario' > 'luigi'
[79]:
True
[80]:
'mario' > 'wario'
[80]:
False
[81]:
'Mario' > 'Wario'
[81]:
False
[82]:
'Wario' < 'mario' # capital case is *before* lowercase in ASCII
[82]:
True
Comparing different lengths
Short strings which are included in longer ones come first in the ordering:
[83]:
'troll' < 'trolley'
[83]:
True
If they only share a prefix with a longer string, Python compares characters after the common prefix, in this case it detects that e
precedes the corresponding s
:
[84]:
'trolley' < 'trolls'
[84]:
True
Exercise - Character intervals
You are given a couple of strings i1
and i2
of two characters each.
We suppose they represent character intervals: the first character of an interval always has order number lower or equal than the second.
There are five possibilities: either the first interval ‘is contained in’, or ‘contains’, or ‘overlaps’, or ‘is before’ or ‘is after’ the second interval. Write some code which tells which containment relation we have.
Example 1 - given:
i1 = 'gm'
i2 = 'cp'
Your program should print:
gm is contained in cp
To see why, you can look at this little representation (you don’t need to print this!):
c g m p
abcdefghijklmnopqrstuvwxyz
Example 2 - given:
i1 = 'mr'
i2 = 'pt'
Your program should print:
mr overlaps pt
because mr
is not contained nor contains nor completely precedes nor completely follows pt
(you don’t need to print this!):
m p r t
abcdefghijklmnopqrstuvwxyz
if
i1
interval coincides withi2
, it is consideraded as containingi2
DO NOT use cycles nor
if
HINT: to satisfy above constraint, think about booleans evaluation order, for example the expression
'g' >= 'c' and 'm' <= 'p' and 'is contained in'
produces as result the string 'is contained in'
[85]:
i1,i2 = 'gm','cp' # gm is contained in cp
#i1,i2 = 'dh','dh' # gm is contained in cp #(special case)
#i1,i2 = 'bw','dq' # bw contains dq
#i1,i2 = 'ac','bd' # ac overlaps bd
#i1,i2 = 'mr','pt' # mr overlaps pt
#i1,i2 = 'fm','su' # fm is before su
#i1,i2 = 'xz','pq' # xz is after pq
# write here
Exercise - The Library of Encodicus
In the study room of the algorithmist Encodicus there is a bookshelf divided in 26 alphabetically ordered sections, where he scrupulously keeps his precious alchemical texts. Every section can contain at most 9 books. One day, Encodicus decides to acquire a new tome for his collection: write some code which given a string representing bookshelf
with the counts of the books and a new book
, finds the right position of the book and updates bookshelf
accordingly
assume no section contains
9
booksassume book names are always lowercase
DO NOT use cycles,
if
, nor string methodsDO NOT manually write strings with 26 characters, or even worse create 26 variables
USE
ord
to find the section position
Example - given:
scaffale = "|a 7|b 5|c 5|d 8|e 2|f 0|g 4|h 8|i 7|j 1|k 6|l 0|m 5|n 0|o 3|p 7|q 2|r 2|s 4|t 6|u 1|v 3|w 3|x 5|y 7|z 6|"
libro = "cycling in the wild"
after your code bookshelf
must result updated with |c 6|
:
>>> print(bookshelf)
|a 7|b 5|c 6|d 8|e 2|f 0|g 4|h 8|i 7|j 1|k 6|l 0|m 5|n 0|o 3|p 7|q 2|r 2|s 4|t 6|u 1|v 3|w 3|x 5|y 7|z 6|
[86]:
book = "cycling in the wild"
#book = "algorithms of the occult"
#book = "theory of the zippo"
#book = "zoology of the software developer"
bookshelf = "|a 7|b 5|c 5|d 8|e 2|f 0|g 4|h 8|i 7|j 1|k 6|l 0|m 5|n 0|o 3|p 7|q 2|r 2|s 4|t 6|u 1|v 3|w 3|x 5|y 7|z 6|"
# write here
Replication operator
With the operator *
you can replicate a string n times, for example:
[87]:
'beer' * 4
[87]:
'beerbeerbeerbeer'
Note a NEW string is created, without tarnishing the original:
[88]:
drink = "beer"
[89]:
print(drink * 4)
beerbeerbeerbeer
[90]:
drink
[90]:
'beer'
Exercise - za za za
Given a syllable
and a phrase
which terminates with a character n
as a digit, write some code which prints a string with the syllable
repeated n
times, separated by spaces.
Your code must work with any string assigned to
syllable
andphrase
Example - given:
phrase = 'the number 7'
syllable = 'za'
after you code, ti should print:
za za za za za za za
[91]:
phrase = 'the number 7'
syllable = 'za' # za za za za za za za
#phrase = 'Give me 5' # za za za za za
# write here
Continue
Go on reading notebook Strings 3 - basic methods
[ ]: