Strings 1 - introduction
Download exercises zip
Strings are immutable character sequences, and one of the basic Python types. In this notebook we will see how to manipulate them.
What to do
Unzip exercises zip in a folder, you should obtain something like this:
strings
strings1.ipynb
strings1-sol.ipynb
strings2.ipynb
strings2-sol.ipynb
strings3.ipynb
strings3-sol.ipynb
strings4.ipynb
strings4-sol.ipynb
strings5-chal.ipynb
jupman.py
WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then a browser. The browser should show a file list: navigate the list and open the notebook
strings1.ipynb
Go on reading the exercises file, sometimes you will find paragraphs marked Exercises which will ask to write Python commands in the following cells.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
Creating strings
There are several ways to define a string.
Double quotes, in one line
[2]:
a = "my first string, in double quotes"
[3]:
print(a)
my first string, in double quotes
Single quotes, in one line
This way is equivalent to previous one.
[4]:
b = 'my second string, in single quotes'
[5]:
print(b)
my second string, in single quotes
Between double quotes, on many lines
[6]:
c = """my third string
in triple double quotes
so I can put it
on many rows"""
[7]:
print(c)
my third string
in triple double quotes
so I can put it
on many rows
Three single quotes, many lines
[8]:
d = '''my fourth string,
in triple single quotes
also can be put
on many lines
'''
[9]:
print(d)
my fourth string,
in triple single quotes
also can be put
on many lines
Printing - the cells
To print a string we can use the function print
:
[10]:
print('hello')
hello
Note that apices are not reported in printed output.
If we write the string without the print
, we will see the apices indeed:
[11]:
'hello'
[11]:
'hello'
What happens if we write the string with double quotes?
[12]:
"hello"
[12]:
'hello'
Notice that by default Jupyter shows single apices.
The same applies if we assign a string to a variable:
[13]:
x = 'hello'
[14]:
print(x)
hello
[15]:
x
[15]:
'hello'
[16]:
y = "hello"
[17]:
print(y)
hello
[18]:
y
[18]:
'hello'
The empty string
The string of zero length is represented with two double quotes ""
or two single apices ''
Note that even if write two double quotes, Jupter shows a string beginning and ending with single apices:
[19]:
""
[19]:
''
The same applies if we associate an empty string to a variable:
[20]:
x = ""
[21]:
x
[21]:
''
Note that even if we ask Jupyter to use print
, we won’t see anything:
[22]:
print("")
[23]:
print('')
Printing many strings
For printing many strings on a single line there are different ways, let’s start from the most simple with print
:
[24]:
x = "hello"
y = "Python"
print(x,y) # note that in the printed characters Python inserted a space:
hello Python
We can add to print
as many parameters we want, which can also be mixed with other types like numbers:
[25]:
x = "hello"
y = "Python"
z = 3
print(x,y,z)
hello Python 3
Length of a string
To obtain the length of a string (or any sequence in general), we can use the function len
:
[26]:
len("ciao")
[26]:
4
[27]:
len("") # empty string
[27]:
0
[28]:
len('') # empty string
[28]:
0
QUESTION: Can we write something like this?
"len"("hello")
[29]:
# write here
QUESTION: can we write something like this? What does it produce? an error? a number? which one?
len("len('hello')")
QUESTION: What do we obtain if we write like this?
len(((((("ciao"))))))
an error
the length of the string
something else
Counting escape sequences: Note that some particular sequences called escape sequences like for example \t
occupy less space of what it seems (with len
they count as 1), but if we print them they will occupy even more than 2 !!
Let’s see an example (in the next paragraph we will delve into the details):
[30]:
len('a\tb')
[30]:
3
[31]:
print('a\tb')
a b
Printing - escape sequences
Some characters sequences called escape sequences are special because instead of showing characters, they force the printing to do particular things like line feed or inserting extra spaces. These sequences are always preceded by the backslash character \
:
Description |
Escape sequence |
---|---|
Linefeed |
|
Tabulation (ASCII tab) |
|
Example - line feed
[32]:
print("hello\nworld")
hello
world
Note the line feed happens only when we use print
, if instead we directly put the string into the cell we will see it verbatim:
[33]:
"ciao\nmondo"
[33]:
'ciao\nmondo'
In a string you can put as many escape sequences as you like:
[34]:
print("Today is\na great day\nisn't it?")
Today is
a great day
isn't it?
Example - tabulation
[35]:
print("hello\tworld")
hello world
[36]:
print("hello\tworld\twith\tmany\ttabs")
hello world with many tabs
EXERCISE: Since escape sequences are special, we might ask ourselves how long they are. Use the function len
to print the string length. Do you notice anything strange?
'ab\ncd'
'ab\tcd'
[37]:
# write the code here
EXERCISE: Try selecting the character sequence printed in the previous cell with the mouse. What do you obtain? A space sequence, or a single tabulation character? Note this can vary according to the program that actually printed the string.
EXERCISE: find a SINGLE string which printed with print
is shown as follows:
This is
an
apparently simple challenge
USE ONLY combinations of
\t
and\n
DON’T use spaces
start and end the string with a single apex
[38]:
# write here
This is
an
apparently simple challenge
EXERCISE: try to find a string which printed with print
is shown as follows:
At te
n
t ion
please!
USE ONLY combinations of
\t
and\n
DON’T use any space
DON’T use triple quotes
[39]:
# write here
At te
n
t ion
please!
Special characters: if we want special characters like the single apex '
or double quotes "
inside a string, we must create a so-called escape sequence, that is, we must first write the backslash character \
and then follow it with the special character we’re interested in:
Description |
Escape sequence |
Printed result |
---|---|---|
Single apex |
|
|
Double quote |
|
|
Backslash |
|
|
Example
Let’s print a string containing a single apex '
and a double quote "
:
[40]:
my_string = "This way I put \'apices\' e \"double quotes\" in strings"
[41]:
print(my_string)
This way I put 'apices' e "double quotes" in strings
If a string begins with double quotes, inside we can freely use single apices, even without backslash \
:
[42]:
print("There's no problem")
There's no problem
If the string begins with single apices, we can freely use double quotes even without the backslash \
:
[43]:
print('It Is So "If You Think So"')
It Is So "If You Think So"
EXERCISE: Find a string to print with print
which shows the following sequence:
the string MUST start and finish with single apices
'
This "genius" of strings wants to /\\/ trick me \//\ with atrocious exercises O_o'
[44]:
# write here
This "genius" of strings wants to /\\/ trick me \//\ with atrocious exercises O_o'
Encodings
ASCII characters
When using strings in your daily programs you typically don’t need to care much how characters are physically represented as bits in memory, but sometimes it does matter. The representation is called encoding and must be taken into account in particular when you read stuff from external sources such as files and websites.
The most famous and used character encoding is ASCII (American Standard Code for Information Interchange), which offers 127 slots made by basic printable characters from English alphabet (a
-z
, A
-Z
, punctuation like .;,!
and characters like (
, @
…) and control sequences (like \t
, \n
)
See Printable characters (Wikipedia)
ASCII Control codes (Wikipedia)
Since original ASCII table lacks support for non-English languages (for example, it lacks Italian accented letters like è
,à
, …), many extensions were made to support other languages, for examples see Extended ASCII page on Wikipedia.
Unicode characters
Whenever we need particular characters like ✪ which are not available on the keyboard, we can look at Unicode characters. There are a lot, and we can often use them in Python 3 by simple copy-pasting. For example, if you go to this page you can copy-paste the character ✪. In other cases it might be so special it can’t even be correctly visualized, so in these cases you can
use a more complex sequence in the format \uxxxx
like this:
Description |
Escape sequence |
Printed result |
---|---|---|
Example star in a circle in format |
|
✪ |
EXERCISE: Search Google for Unicode heart and try printing a heart in Python, both by directly copy-pasting the character and by using the notation \uxxxx
[45]:
# write here
I ♥ Python, with copy-paste
I ♥ Python, also in format \uxxxx
Unicode references: Unicode can be a complex topic we just mentioned, if you ever need to deal with complex character sets like japanese or heterogenous text encodings here a couple of references you should read:
first part on Unicode encoding from Strings chapter from book Dive into Python 3
Python 3 Unicode documentation
Strings are immutable
Strings are immutable objects, so once they are created you cannot change them anymore. This might appear retrictive, but it’s not so tragic, because we still have available these alternatives:
generate a new string composed from other strings
if we have a variable to which we assigned a string, we can assign another string to that variable
Let’s generate a new string starting from previous ones, for example by joining two of them with the operator +
[46]:
x = 'hello'
[47]:
y = x + 'world'
[48]:
x
[48]:
'hello'
[49]:
y
[49]:
'helloworld'
The +
operation, when executed among strings, it joins them by creating a NEW string. This means that the association to x
it didn’t change at all, the only modification we can observe will be the variable y
which is now associated to the string 'helloworld
. Try making sure of this in Python Tutor by repeatdly clicking on Next button:
[50]:
# WARNING: before using the function jupman.pytut() which follows,
# it is necessary to first execute this cell with Shift+Enter
# it's sufficient to execute it only once, you find it also in all other notebooks in the first cell
import jupman
[51]:
x = 'hello'
y = x + 'world'
print(x)
print(y)
jupman.pytut()
hello
helloworld
[51]:
Reassign variables
Other variations to memory state can be obtained by reassigning the variables, for example:
[52]:
x = 'hello'
[53]:
y = 'world'
[54]:
x = y # we assign to x the same string contained in y
[55]:
x
[55]:
'world'
[56]:
y
[56]:
'world'
If a string is created and at some point no variables point to it, Python automatically takes care to eliminate it from the memory. In the case above, the string hello
is never actually changed: at some point no variable is associated with it anymore and so Python eliminates the string from the memory. Have a look at what happens in Python Tutor:
[57]:
x = 'hello'
y = 'world'
x = y
jupman.pytut()
[57]:
Reassign a variable to itself
We may ask ourselves what happens when we write something like this:
[58]:
x = 'hello'
x = x
[59]:
print(x)
hello
No big changes, the assignment of x
remained the same without alterations.
But what happens if to the right of the =
we put a more complex formula?
[60]:
x = 'hello'
x = x + 'world'
print(x)
helloworld
Let’s try to carefully understand what happened.
In the first line, Python generated the string 'hello'
and assigned it to the variable x
. So far, nothing extraordinary.
Then, in the second line, Python did two things:
it calculated the result of the expression
x + 'world'
, by generating a NEW stringhelloworld
it assigned the generated string
helloworld
to the variablex
It is fundamental to understand that whenever a reassignment is performed both passages occurs, so it’s worth repeating them:
FIRST the result of the expression to the right of
=
is calculated (so when the old value ofx
is still available)THEN the result is associated to the variable to the left of
=
symbol
If we check out what happens in Python Tutor, this double passage is executed in a single shot:
[61]:
x = 'hello'
x = x + 'world'
jupman.pytut()
[61]:
EXERCISE: Write some code that changes memory state in such a way so that in the end the following is printed:
z = This
w = was
x = a problem
y = was
s = This was a problem
to write the code, USE ONLY the symbols
=
,+
,z
,w
,x
,y
,s
AND NOTHING ELSEfeel free to use as many lines of code as you deem necessary
feel free to use any symbol as many times you deem necessary
[62]:
# these variables are given
z = "This"
w = 'is'
x = 'a problem'
y = 'was'
s = ' '
# write here the code
[63]:
print("z = ", z)
print("w = ", w)
print("x = ", x)
print("y = ", y)
print("s = ", s)
Strings and numbers
Python strings have the type str
:
[64]:
type("hello world")
[64]:
str
In strings we can insert characters which represent digits:
[65]:
print("The character 5 represents the digit five, the character 3 represents the digit three")
The character 5 represents the digit five, the character 3 represents the digit three
Obviously, we can also substitute a sequence of digits, to obtain something which looks like a number:
[66]:
print("The sequence of characters 7583 represents the number seven thousand five hundred eighty-three")
The sequence of characters 7583 represents the number seven thousand five hundred eighty-three
Having said that, we can ask ourselves how Python behaves when we have a string which contains only a sequence of characters which represents a number, like for example '254'
Can we use 254
(which we wrote like it were a string) also as if it were a number? For example, can we sum 3
to it?
'254' + 3
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-29-d39aa62a7e3d> in <module>
----> 1 "254" + 3
TypeError: can only concatenate str (not "int") to str
As you see, Python immediately complains, because we are trying to mix different types.
SO:
by writing
'254'
between apices we create a string of typestr
by writing
254
we create a number of typeint
[67]:
type('254')
[67]:
str
[68]:
type(254)
[68]:
int
BEWARE OF print
!!
If you try to print a string which only contains digits, Python will show it without apices, and this might mislead you about its true nature !!
[69]:
print('254')
254
[70]:
print(254)
254
Only in Jupyter, to show constants, variables or results of calculations, as print
alternative you can directly insert a formula in the cell. In this case we are simply showing a constant, and whenever it is a string you will see apices:
[71]:
'254'
[71]:
'254'
[72]:
254
[72]:
254
The same reasoning applies also to variables:
[73]:
x = '254'
[74]:
x
[74]:
'254'
[75]:
y = 254
[76]:
y
[76]:
254
So, only in Jupyter, when you need to show a constant, a variable or a calculation often it’s more convenient to directly write it in the cell without using print
.
Conversions - from string to number
Let’s go back to the problem of summing '254' + 3
. The first one is a string, the second a number. If they were both numbers the sum would surely work:
[77]:
254 + 3
[77]:
257
So we can try to convert the string '254'
into an authentic integer. To do it, we can use int
as if it were a function, and pass as argument the string to be converted:
[78]:
int('254') + 3
[78]:
257
WARNING: strings and numbers are immutable !!
This means that by writing int('254')'
a new number is generated without minimally affecting the string '254'
from where we started from. Let’s see am example:
[79]:
x = '254' # assign to variable x the string '254'
[80]:
y = int(x) # assign to variable y the number obtained by converting '254' in int
[81]:
x # variable x is now assigned to string '254'
[81]:
'254'
[82]:
y # in y now there is a number instead (note we don't have apices here)
[82]:
254
It might be useful to see again the example in Python Tutor:
[83]:
x = "254"
y = int(x)
print(y + 3)
jupman.pytut()
257
[83]:
EXERCISE: Try to convert a string which represents an ill-formed number (for example a number with inside a character: '43K12'
) into an int
. What happens?
[84]:
# write here
Conversions - from number to string
Any object can be converted to string by using str
as if it were a function and by passing the object to convert. Let’s try then to convert a number into a string.
[85]:
str(5)
[85]:
'5'
note the apices in the result, which show we actually obtained a string.
If by chance we want to obtain a string which is the concatenation of objects of different types we need to be careful:
x = 5
s = 'Workdays in a week are ' + x
print(s)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-154-5951bd3aa528> in <module>
1 x = 5
----> 2 s = 'Workdays in a week are ' + x
3 print(s)
TypeError: can only concatenate str (not "int") to str
A way to circumvent the problem (even if not the most convenient) is to convert into string each of the objects we’re using in the concatenation:
[86]:
x = 3
y = 1.6
s = "This week I've been jogging " + str(x) + " times running at an average speed of " + str(y) + " km/h"
print(s)
This week I've been jogging 3 times running at an average speed of 1.6 km/h
QUESTION: Having said that, after executing the code in previous cell, variable x
is going to be associated to a number or a string ?
If you have doubts, use Python Tutor.
Show answerFormatting strings
Concatenating strings with plus sign like above is cumbersome and error prone. There are several better solutions, for a thorough review we refer to Real Python website.
Formatting with %
Here we now see how to format strings with the %
operator. This solution is not the best one, but it’s widely used and supported in all Python versions, so we adopted it throughout the book:
[87]:
x = 3
"I jumped %s times" % x
[87]:
'I jumped 3 times'
Notice we put a so-called place-holder %s
inside the string, which tells Python to replace it with a variable. To feed Python the variable, after the string we have to put a %
symbol followed by the variable, in this case x
.
If we want to place more than one variable, we just add more %s
place-holders and after the external %
we place the required variables in round parenthesis, separating them with commas:
[88]:
x = 3
y = 5
"I jumped %s times and did %s sprints" % (x,y)
[88]:
'I jumped 3 times and did 5 sprints'
We can put as many variables as we want, also non-numerical ones:
[89]:
x = 3
y = 5
prize = 'Best Athlet in Town'
"I jumped %s times, did %s sprints and won the prize '%s'" % (x,y,prize)
[89]:
"I jumped 3 times, did 5 sprints and won the prize 'Best Athlet in Town'"
Formatting with f-strings
f-strings allow to directly insert expressions between curly brackets {} into the string. To signal Python to calculate and convert the expressions into strings, the string must be preceded by the f letter. Note the moment you add the f your editor should show the expressions between curly brackets with a different color.
Warning: f-strings are only available since Python \(\geq 3.6\)
[90]:
title = "King of Great Britain"
start = 1760
end = 1801
s1 = f"Giorge III was {title.upper()} from {start} until {end}."
print(s1)
s2 = f"He ruled for {end - start} years."
print(s2)
Giorge III was KING OF GREAT BRITAIN from 1760 until 1801.
He ruled for 41 years.
Exercise - supercars
You’ve got some money, so you decide to buy two models of supercars. Since you already know accidents are on the way, for each model you will buy as many cars as there are characters in each model name.
Write some code which stores in the string s
the number of cars you will buy into the strings:
sa
formatted with%s
placeholderssb
formatted as f-string
Example - given:
car1 = 'Jaguar'
car2 = 'Ferrari'
After your code, it should show:
>>> s1
'I will buy 6 Jaguar and 7 Ferrari supercars'
>>> s2
'I will buy 6 Jaguar and 7 Ferrari supercars'
[91]:
car1, car2 = 'Jaguar','Ferrari' # I will buy 6 Jaguar and 7 Ferrari supercars
#car1, car2 = 'Porsche','Lamborghini' # I will buy 7 Porsche and 11 Lamborghini supercars
# write here
Continue
Go on reading notebook Strings 2 - operators
[ ]: