# Strings 2 - operators¶

Browse files online

Python offers several operators to work with strings:

Operator

Use

Result

Meaning

len

len(str)

int

Returns the length of the string

concatenation

str + str

str

Concatenate two strings

str in str

bool

Checks whether a string is contained inside another one

indexing

str[int]

str

Reads the character at the specified index

slice

str[int:int ]

str

Extracts a sub-string

equality

==,!=

bool

Checks whether strings are equal or different

comparisons

<,<=,>, >=

bool

Performs lexicographic comparison

ord

ord(str)

int

Returns the order of a character

chr

chr(int)

str

Given an order, returns the corresponding character

replication

str * int

str

Replicate the string

## What to do¶

1. Unzip exercises zip in a folder, you should obtain something like this:

strings
strings1.ipynb
strings1-sol.ipynb
strings2.ipynb
strings2-sol.ipynb
strings3.ipynb
strings3-sol.ipynb
strings4.ipynb
strings4-sol.ipynb
jupman.py


WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !

1. open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook strings2.ipynb

2. Go on reading the exercises file, sometimes you will find paragraphs marked Exercises which will ask to write Python commands in the following cells. Exercises are graded by difficulty, from one star ✪ to four ✪✪✪✪

Shortcut keys:

• to execute Python code inside a Jupyter cell, press Control + Enter

• to execute Python code inside a Jupyter cell AND select next cell, press Shift + Enter

• to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press Alt + Enter

• If the notebooks look stuck, try to select Kernel -> Restart

A string is a sequence of characters, and often we might want to access a single character by specifying the position of the character we are interested in.

It’s important to remember that the position of characters in strings start from 0. For reading a character in a certain position, we need to write the string followed by square parenthesis and spcify the position inside. Examples:

[2]:

'park'[0]

[2]:

'p'

[3]:

'park'[1]

[3]:

'a'

[4]:

#0123
'park'[2]

[4]:

'r'

[5]:

#0123
'park'[3]

[5]:

'k'


If we try to go beyond the last character, we will get an error:

#0123
'park'[4]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-106-b8f1f689f0c7> in <module>
1 #0123
----> 2 'park'[4]

IndexError: string index out of range


Before we used a string by specifying it as a literal, but we can also use variables:

[6]:

    #01234
x = 'cloud'

[7]:

x[0]

[7]:

'c'

[8]:

x[2]

[8]:

'o'


How is represented the character we’ve just read? If you noticed, it is between quotes like if it were a string. Let’s check:

[9]:

type(x[0])

[9]:

str


It’s really a string. To somebody this might come as a surprise, also from a philosophical standpoint: Python strings are made of… strings! Other programming languages may use a specific type for the single character, but Python uses strings to be able to better manage complex alphabets as, for example, japanese.

QUESTION: Let’s suppose x is any string. If we try to execute this code:

x[0]


we will get:

1. always a character

2. always an error

3. sometimes a character, sometimes an error according to the string

QUESTION: Let’s suppose x is an empty string. If we try to execute this code:

x[len(x)]


we will get:

1. always a character

2. always an error

3. sometimes a character, sometimes an error according to the string at hand

### Exercise - alternate¶

Given two strings both of length 3, print a string which alternates characters from both strings. You code must work with any string of this length

Example - given:

x="say"
y="hi!"


it should print:

shaiy!

Show solution
[10]:

# write here


shaiy!


### Negative indexes¶

In Python we can also use negative indexes, which instead to start from the beginning they start from the end:

[11]:

#4321
"park"[-1]

[11]:

'k'

[12]:

#4321
"park"[-2]

[12]:

'r'

[13]:

#4321
"park"[-3]

[13]:

'a'

[14]:

#4321
"park"[-4]

[14]:

'p'


If we go one step beyond, we get an error:

#4321
"park"[-5]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-126-668d8a13a324> in <module>
----> 1 "park"[-5]

IndexError: string index out of range


QUESTION: Suppose x is a NON-empty string. What do we get with the following expression?

x[-len(x)]

1. always a character

2. always an error

3. sometimes a character, sometime an error according to the string

QUESTION: Suppose x is a some string (possibly empty), the expressions

x[len(x) - 1]


and

x[-len(x)]


are equivalent ? What do they do ?

QUESTION: If x is a non-empty string, what does the following expression produce? Can we simplify it to a shorter one?

(x + x)[len(x)]


QUESTION: If x is a non-empty string, what does the following expression produce? An error? Something else? Can we simplify it?

'park'[0][0]


QUESTION: If x is a non-empty string, what does the following expression produce? An error? Something else? Can we simplify it?

(x[0])[0]


## Substitute characters¶

We said strings in Python are immutable. Suppose we have a string like this:

[15]:

    #01234
x = 'port'


and, for example, we want to change the character at position 2 (in this case, the r) into an s. What do we do?

We might be tempted to write like the following, but Python would punish us with an error:

x[2] = 's'

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-113-e5847c6fa4bf> in <module>
----> 1 x[2] = 's'

TypeError: 'str' object does not support item assignment


The correct solution is assigning a completely new string to x, obtained by taking pieces from the previous one:

[16]:

x = x[0] + x[1] + 's' + x[3]

[17]:

x

[17]:

'post'


If seeing x to the right of equal sign baffles you, we can decompose the code like this and it will work the same way:

[18]:

x = "port"
y = x
x = y[0] + y[1] + 's' + y[3]


Try it in Python Tutor:

[19]:

x = "port"
y = x
x = y[0] + y[1] + 's' + y[3]

jupman.pytut()

[19]:


## Slices¶

We might want to read only a subsequence which starts from a position and ends up in another one. For example, suppose we have:

[20]:

    #0123456789
x = 'mercantile'


and we want to extract the string 'canti', which starts at index 3 included. We might extract the single characters and concatenate them with + sign, but we would write a lot of code. A better option is to use the so-called slices: simply write the string followed by square parenthesis containing only start index (included), a colon, and finally end index (excluded):

[21]:

    #0123456789
x = 'mercantile'

x[3:8]   # note the : inside start and end indexes

[21]:

'canti'


WARNING: Extracting with slices DOES NOT modify the original string !!

Let’s see an example:

[22]:

    #0123456789
x = 'mercantile'

print('               x is', x)
print('The slice x[3:8] is', x[3:8])
print('               x is', x)       # note x continues to point to old string!

               x is mercantile
The slice x[3:8] is canti
x is mercantile


QUESTION: if x is any string of length at least 5, what does this code produce? An error? It works? Can we shorten it?

x[3:4]


### Exercise - garalampog¶

Write some code to extract and print alam from the string "garalampog". Try guessing the correct indexes.

Show solution
[23]:

x = "garalampog"

# write here


alam


### Exercise - ifEweEfav lkSD lkWe¶

Write some code to extract and print kD from the string "ifE\te\nfav  lkD lkWe". Be careful of spaces and special characters (before you might want to print x). Try guessing correct indexes.

Show solution
[24]:

x = "ifE\te\nfav  lkD lkWe"

# write here


kD


### Slices - limits¶

Whenever we use slice we must be careful with index limits. Let’s see how they behave:

[25]:

#012345
"chair"[0:3]  # from index 0 *included* to 3 *excluded*

[25]:

'cha'

[26]:

#012345
"chair"[0:4]  # from index 0 *included* to 4 *excluded*

[26]:

'chai'

[27]:

#012345
"chair"[0:5]  # from index 0 *included* to 5 *excluded*

[27]:

'chair'

[28]:

#012345
"sedia"[0:6]   # if we go beyond string length Python doesn't complain

[28]:

'sedia'


QUESTION: if x is any string (also empty), what does this expression do? Can it give an error? Does it return something useful?

x[0:len(x)]


### Slice - Omitting limits¶

If we want, it’s possible to omit the starting index, in this case Python will suppose it’s a 0:

[29]:

#0123456789
"catamaran"[:3]

[29]:

'cat'


It’s also possible to omit the ending index, in that case Python will extract until the end of the string:

[30]:

#0123456789
"catamaran"[3:]

[30]:

'amaran'


By omitting both indexes we obtain the full string:

[31]:

"catamaran"[:]

[31]:

'catamaran'


### Exercise - ysterymyster¶

Write some code that given a string x prints the string composed with all the characters of x except the first one, followed by all characters of x except the last one.

• your code must work with any string

Example 1 - given:

x = "mystery"


must print:

ysterymyster


Example 2 - given:

x = "talking"


must print:

alkingtalkin

Show solution
[32]:

x = "mystery"
#x = "talking"

# write here


ysterymyster


### Slice - negative limits¶

If we want, it’s also possible to set negative limits, although it’s not always intuitive:

[33]:

#0123456

"vegetal"[3:0]   # from index 3 to positive indexes <= 3 doesn't produce anything

[33]:

''

[34]:

#0123456
"vegetal"[3:1]   # from index 3 to positive indexes <= 3 doesn't produce anything

[34]:

''

[35]:

#0123456
"vegetal"[3:2]  # from index 3 to positive indexes <= 3 doesn't produce anything

[35]:

''

[36]:

#0123456
"vegetal"[3:3]  # from index 3 to positive indexes <= 3 doesn't produce anything

[36]:

''


Let’s see what happens with negative indexes:

[37]:

#0123456   positive indexes
#7654321   negative indexes
"vegetal"[3:-1]

[37]:

'eta'

[38]:

#0123456   positive indexes
#7654321   negative indexes
"vegetal"[3:-2]

[38]:

'et'

[39]:

#0123456   positive indexes
#7654321   negative indexes
"vegetal"[3:-3]

[39]:

'e'

[40]:

#0123456   positive indexes
#7654321   negative indexes
"vegetal"[3:-4]

[40]:

''

[41]:

#0123456   positive indexes
#7654321   negative indexes
"vegetal"[3:-5]

[41]:

''


### Exercise - javarnanda¶

Given a string x, write some code to extract and print its last 3 characters joined to the to first 3.

• Your code should work for any string of length equal or greater than 3

Example 1 - given:

x = "javarnanda"


it should print:

javnda


Example 2 - given:

x = "abcd"


it should print:

abcbcd

Show solution
[42]:

x = "javarnanda"
#x = "abcd"

# write here


javnda


### Slice - modifying¶

Suppose to have the string

[43]:

    #0123456789
s = "the table is placed in the center of the room"


and we want to change s assignment so it becomes associated to the string:

#0123456789
"the chair is placed in the center of the room"


Since both strings are similar, we might be tempted to only redefine the character sequence which corresponds to the word "table", which goes from index 4 included to index 9 excluded:

s[4:9] = "chair"   # WARNING! WRONG!

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-57-0de7363c6882> in <module>
----> 1 s[4:9] = "chair"   # WARNING! WRONG!

TypeError: 'str' object does not support item assignment


Sadly, we would receive an error, because as repeated many times strings are IMMUTABLE, so we cannot select a chunk of a particular string and try to change the original string. What we can do instead is to build a NEW string from pieces of the original string, concatenates the desired characters and associates the result to the variabile of which we want to modify the assignment:

[44]:

    #0123456789
s = "the table is placed in the center of the room"
s = s[0:4] + "chair" + s[9:]
print(s)

the chair is placed in the center of the room


When Python finds the line

s = s[0:4] + "chair" + s[9:]


FIRST it calculates the result on the right of the =, and THEN associates the result to the variable on the left. In the expression on the right only NEW strings are generated, which once built can be assigned to variable s

### Exercise - the run¶

Write some code such that when given the string s

s = 'The Gold Rush has begun.'


and some variables

what = 'Atom'
happened = 'is over'


substitues the substring 'Gold' with the string in the variable what and substitues the substring 'has begun' with the string in the variable happened.

After exectuing your code, the string associated to s should be

>>> print(s)
"The Atom Rush is over."

• DON’T use constant characters in your code, i.e. dots '.' aren’t allowed !

Show solution
[45]:

    #01234567890123456789012345678
s = 'The Gold Rush has begun.'
what = 'Atom'
happened = 'is over'

# write here


The Atom Rush is over.


## in operator¶

To check if a string is contained in another one, we use the the in operator.

Note the result of this expression is a boolean:

[46]:

'the' in 'Singing in the rain'

[46]:

True

[47]:

'si' in 'Singing in the rain'  # in operator is case-sensitive

[47]:

False

[48]:

'Si' in 'Singing in the rain'

[48]:

True


### Exercise - contained 1¶

You are given two strings x and y, and a third z. Write some code which prints True if x and y are both contained in z.

Example 1 - given:

x = 'cad'
y = 'ra'


it should print:

True


Example 2 - given:

x = 'zam'
y = 'ra'


it should print:

False

Show solution
[49]:

x,y,z = 'cad','ra','abracadabra'   # True

# write here


True


### Exercise - contained 2¶

Given three strings x, y, z, write some code which prints True if the string x is contained in at least one of the strings y or z, otherwise prints False

• your code should work with any set of strings

Example 1 - given:

x = "ope"
y = "honesty makes for long friendships"
z = "I hope it's clear enough"


it should print:

True

Example 2 - given:

x = "nope"
y = "honesty makes for long friendships"
z = "I hope it's clear enough"


it should print:

False

Example 3 - given:

x = "cle"
y = "honesty makes for long friendships"
z = "I hope it's clear enough"


it should show:

True

Show solution
[50]:

x,y,z = "ope","honesty makes for long friendships","I hope it's clear enough"   # True
#x,y,z = "nope","honesty makes for long friendships","I hope it's clear enough"  # False
#x,y,z = "cle","honesty makes for long friendships","I hope it's clear enough"  # True

# write here


True


## Comparisons¶

Python offers us the possibility to perform a lexicographic comparison among strings, like we would when placing names in an address book. Although sorting names is something intuitive we often do, we must be careful about special cases.

First, let’s determine when two strings are equal.

### Equality operators¶

To check whether two strings are equal, you can use te operator == which as result produces the boolean True or False

WARNING: == is written with TWO equal signs !!!

[51]:

"dog" == "dog"

[51]:

True

[52]:

"dog" == "wolf"

[52]:

False


Equality operator is case-sensitive:

[53]:

"dog" == "DOG"

[53]:

False


To check whether two strings are NOT equal, we can use the operator !=, which we can expect to behave exactly as the opposite of ==:

[54]:

"dog" != "dog"

[54]:

False

[55]:

"dog" != "wolf"

[55]:

True

[56]:

"dog" != "DOG"

[56]:

True


As an alternative, we might use the operator not:

[57]:

not "dog" == "dog"

[57]:

False

[58]:

not "wolf" == "dog"

[58]:

True

[59]:

not "dog" == "DOG"

[59]:

True


QUESTION: what does the following code print?

x = "river" == "river"
print(x)


QUESTION: for each of the following expressions, try to guess whether it produces True or False

1. 'hat' != 'Hat'

2. 'hat' == 'HAT'

3. 'choralism'[2:5] == 'contemporary'[7:10]

4. 'AlAbAmA'[4:] == 'aLaBaMa'

5. 'bright'[9:20] == 'dark'[10:15]

6. 'optical'[-1] == 'crystal'[-1]

7. ('hat' != 'jacket') == ('trousers' != 'bow')

8. ('stra' in 'stradivarius') == ('div' in 'digital divide')

9. len('note') in '5436'

10. str(len('note') in '5436'

11. len('posters') in '5436'

12. str(len('posters')) in '5436'


### Exercise - statist¶

Write some code which prints True if a word begins with the same two characters it ends with.

• Your code should work for any word

Show solution
[60]:

word = 'statist'   # True
#word = 'baobab'   # False
#word = 'maxima'   # True
#word = 'karma'    # False

# write here


True


### Comparing characters¶

Characters have an inherent order we can exploit. Let’s see an example:

[61]:

'a' < 'g'

[61]:

True


another one:

[62]:

'm' > 'c'

[62]:

True


They sound reasonable comparisons! But what about this (notice capital 'Z')?

[63]:

'a' < 'Z'

[63]:

False


Maybe this doesn’t look so obvious. And what if we get creative and compare with symbols such as square bracket or Unicode hearts ??

[64]:

'a' > '♥'

[64]:

False


To determine how to deal with this special cases, we must remember ASCII assignes a position number to each character, defining as a matter of fact an ordering between all its characters.

If we want to know the corresponding number of a character, we can use the function ord:

[65]:

ord('a')

[65]:

97

[66]:

ord('b')

[66]:

98

[67]:

ord('z')

[67]:

122


If we want to go the other way, given a position number we can obtain the corresponding character with chr function:

[68]:

chr(97)

[68]:

'a'


Uppercase characters have different positions:

[69]:

ord('A')

[69]:

65

[70]:

ord('Z')

[70]:

90


EXERCISE: Using the functions above, try to find which characters are between capital Z and lowercase a

Show solution
[1]:

# write here



The ordering allows us to perform lexicographic comparisons between single characters:

[72]:

'a' < 'b'

[72]:

True

[73]:

'g' >= 'm'

[73]:

False


EXERCISE: Write some code that:

1. prints the ord values of 'A', 'Z' and a given char

2. prints True if char is uppercase, and False otherwise

• Would your code also work with accented capitalized characters such as 'Á'?

• NOTE: the possibile character sets are way too many, so the proper solution would be to use the method isupper we will see in the next tutorial.

Show solution
[15]:

char = 'G'   # True
#char = 'g'  # False
#char = 'Á'  # True ??
# write here


A: 65  Z: 90
G: 71

[15]:

False


Also, since Unicode character set includes ASCII, the ordering of ASCII characters can be used to safely compare them against unicode characters, so comparing characters or their ord should be always equivalent:

[74]:

ord('a')   # ascii

[74]:

97

[75]:

ord('♥')   # unicode

[75]:

9829

[76]:

'a' > '♥'

[76]:

False

[77]:

ord('a') > ord('♥')

[77]:

False


Python also offers lexicographic comparisons on strings with more than one character. To understand what the expected result should be, we must distinguish among several cases, though:

• strings of equal / different length

• strings with same / mixed case

Let’s begin with same length strings:

[78]:

'mario' > 'luigi'

[78]:

True

[79]:

'mario' > 'wario'

[79]:

False

[80]:

'Mario' > 'Wario'

[80]:

False

[81]:

'mario' > 'Wario'  # capital case is *before* lowercase in ASCII

[81]:

True


### Comparing different lengths¶

Short strings which are included in longer ones come first in the ordering:

[82]:

'troll' < 'trolley'

[82]:

True


If they only share a prefix with a longer string, Python compares characters after the common prefix, in this case it detects that s is greater than corresponding e:

[83]:

'trolls' < 'trolley'

[83]:

False


### Exercise - Character intervals¶

You are given a couple of strings i1 and i2 of two characters each.

We suppose they represent character intervals: the first character of an interval always has order number lower or equal than the second.

There are five possibilities: either the first interval ‘is contained in’, or ‘contains’, or ‘overlaps’, or ‘is before’ or ‘is after’ the second interval. Write some code which tells which containment relation we have.

Example 1 - given:

i1 = 'gm'
i2 = 'cp'


gm is contained in cp


To see why, you can look at this little representation (you don’t need to print this!):

  c   g     m  p
abcdefghijklmnopqrstuvwxyz


Example 2 - given:

i1 = 'mr'
i2 = 'pt'


mr overlaps pt


because mr is not contained nor contains nor completely precedes nor completely follows pt (you don’t need to print this!):

            m  p r t
abcdefghijklmnopqrstuvwxyz

• if i1 interval coincides with i2, it is consideraded as containing i2

• DO NOT use cycles nor if

• HINT: to satisfy above constraint, think about booleans evaluation order, for example the expression

'g' >= 'c' and 'm' <= 'p' and 'is contained in'


produces as result the string 'is contained in'

Show solution
[84]:

i1,i2 = 'gm','cp'   # gm is contained in cp
#i1,i2 = 'dh','dh'  # gm is contained in cp  #(special case)
#i1,i2 = 'bw','dq'  # bw contains dq
#i1,i2 = 'ac','bd'  # ac overlaps bd
#i1,i2 = 'mr','pt'  # mr overlaps pt
#i1,i2 = 'fm','su'  # fm is before su
#i1,i2 = 'xz','pq'  # xz is after pq

# write here



## Replication operator¶

With the operator * you can replicate a string n times, for example:

[85]:

'beer' * 4

[85]:

'beerbeerbeerbeer'


Note a NEW string is created, without tarnishing the original:

[86]:

drink = "beer"

[87]:

print(drink * 4)

beerbeerbeerbeer

[88]:

drink

[88]:

'beer'


### Exercise - za za za¶

Given a syllable and a phrase which terminates with a character n as a digit, write some code which prints a string with the syllable repeated n times, separated by spaces.

• Your code must work with any string assigned to syllable and phrase

Example - given:

phrase = 'the number 7'
syllable = 'za'


after you code, ti should print:

za za za za za za za

Show solution
[89]:


phrase = 'the number 7'
syllable = 'za'         # za za za za za za za
#phrase = 'Give me 5'   # za za za za za

# write here


za za za za za za za


## Continue¶

Go on reading notebook Strings 3 - methods

[ ]: