Lists 4 - Search methods

Browse files online

Lists offer several different methods to perform searches and transformations inside them, but beware: the power is nothing without control! Sometimes you might feel the need to use them, but very often they hide traps you will later regret. So whenever you write code with one of these methods, always ask yourself the questions we will stress.

Method

Returns

Description

str1.split(str2)

list

Produces a list with all the words in str1 separated from str2

list.count(obj)

int

Counts the occurrences of an element

list.index(obj)

int

Searches for the first occurence of an element and returns its position

list.remove(obj)

None

Removes the first occurrence of an element

What to do

1. Unzip exercises zip in a folder, you should obtain something like this:

lists
lists1.ipynb
lists1-sol.ipynb
lists2.ipynb
lists2-sol.ipynb
lists3.ipynb
lists3-sol.ipynb
lists4.ipynb
lists4-sol.ipynb
lists5-chal.ipynb
jupman.py


WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !

1. open Jupyter Notebook from that folder. Two things should open, first a console and then a browser. The browser should show a file list: navigate the list and open the notebook lists4.ipynb

2. Go on reading the exercises file, sometimes you will find paragraphs marked Exercises which will ask to write Python commands in the following cells.

Shortcut keys:

• to execute Python code inside a Jupyter cell, press Control + Enter

• to execute Python code inside a Jupyter cell AND select next cell, press Shift + Enter

• to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press Alt + Enter

• If the notebooks look stuck, try to select Kernel -> Restart

split method - from strings to lists

The split method of strings must be called on a string and a separator must be passed as parameter, which can be a single character or a substring. The result is a list of strings without the separator.

[2]:

"Finally the pirates shared the treasure".split("the")

[2]:

['Finally ', ' pirates shared ', ' treasure']


In practice this method is the opposite of lists method join we’ve already seen, with the important difference this method must be called on strings and not lists.

By calling split without arguments generic blanks are used as separators (space, \n, tab \t, etc)

[3]:

s = "Finally the\npirates\tshared     the treasure"
print(s)

Finally the
pirates shared     the treasure

[4]:

s.split()

[4]:

['Finally', 'the', 'pirates', 'shared', 'the', 'treasure']


It’s also possible to limit the number of elements to split by specifying the parameter maxsplit:

[5]:

s.split(maxsplit=2)

[5]:

['Finally', 'the', 'pirates\tshared     the treasure']


WARNING: What happens if the string does not contain the separator? Remember to also consider this case!

[6]:

"I talk and overtalk and I never ever take a break".split(',')

[6]:

['I talk and overtalk and I never ever take a break']


QUESTION: Look at thie cose. Will it print something? Or will it produce an error?

1. "revolving\tdoor".split()

2. "take great\t\ncare".split()

3. "do not\tforget\nabout\tme".split('\t')

4. "non ti scordar\ndi\tme".split(' ')

5. "The Guardian of the Abyss stared at us".split('abyss')[1]

6. "".split('abyss')[0]

7. "abyss_OOOO_abyss".split('abyss')[0]


Exercise - trash dance

You’ve been hired to dance in the last video of the notorious band Melodic Trash. You can’t miss this golden opportunity. Excited, you start reading the score, but you find a lot of errors - of course the band doesn’t need to know about writing scores to get tv time. There are strange symbols, and the last bar is too long (after the sixth bar) and needs to be put one row at a time. Write some code which fixes the score in a list dance.

• DO NOT write string constants from the input in your code (so no "Ra Ta Pam" …)

Example - given:

music = "Zam Dam\tZa Bum Bum\tZam\tBam To Tum\tRa Ta Pam\tBar Ra\tRammaGumma  Unza\n\t\nTACAUACA \n BOOMBOOM!"


after your code it must result:

>>> print(dance)
['Zam Dam',
'Za Bum Bum',
'Zam',
'Bam To Tum',
'Ra Ta Pam',
'Bar Ra',
'RammaGumma',
'Unza',
'TACAUACA',
'BOOMBOOM!']

Show solution
[7]:

music = "Zam Dam\tZa Bum Bum\tZam\tBam To Tum\tRa Ta Pam\tBar Ra\tRammaGumma  Unza\n\t\nTACAUACA \n BOOMBOOM!"

# write here


Exercise - Trash in tour

The Melodic Trash band strikes again! In a new tour they present the summer hits. The records company only provides the sales numbers in angosaxon format, so before communicating them to Italian media we need a conversion.

Write some code which given the hits and a position in the hit parade, (from 1 to 4), prints the sales number.

• NOTE: commas must be substituted with dots

Example - given:

hits = """6,230,650 - I love you like the moldy tomatoes in the fridge
2,000,123 - The pain of living filthy rich
100,000 - Groupies are never enough
837 - Do you remember the trashcans in the summer..."""

position = 1   # the tomatoes
#position = 4  # the trashcans


Prints:

Number 1 in hit parade "I love you like the moldy tomatoes in the fridge" sold 6.230.650 copies

Show solution
[8]:

hits = """6,230,650 - I love you like the moldy tomatoes in the fridge
2,000,123 - The pain of living filthy rich
100,000 - Groupies are never enough
837 - Do you remember the trashcans in the summer..."""

position = 1   # the tomatoes
#position = 4  # the trashcans

# write here


Exercise - manylines

Given the following string of text:

"""This is a string
of text on
several lines which tells nothing."""

1. print it

2. prints how many lines, words and characters it contains

3. sort the words in alphabetical order and print the first and last ones in lexicographical order

You should obtain:

This is a string
of text on
several lines which tells nothing.

Lines: 3   words: 12   chars: 62

['T', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 's', 't', 'r', 'i', 'n', 'g', '\n', 'o', 'f', ' ', 't', 'e', 'x', 't', ' ', 'o', 'n', '\n', 's', 'e', 'v', 'e', 'r', 'a', 'l', ' ', 'l', 'i', 'n', 'e', 's', ' ', 'w', 'h', 'i', 'c', 'h', ' ', 't', 'e', 'l', 'l', 's', ' ', 'n', 'o', 't', 'h', 'i', 'n', 'g', '.']
62

First word: This
Last word : which
['This', 'a', 'is', 'lines', 'nothing.', 'of', 'on', 'several', 'string', 'tells', 'text', 'which']

Show solution
[9]:

s = """This is a string
of text on
several lines which tells nothing."""

# write here


Exercise - takechars

✪ Given a phrase which contains exactly 3 words and has always as a central word a number $$n$$, write some code which PRINTS the first $$n$$ characters of the third word.

Example - given:

phrase = "Take 4 letters"


lett

Show solution
[10]:

phrase = "Take 4 letters"        # lett
#phrase= "Getting 5 caratters"   # carat
#phrase= "Take 10 characters"    # characters

# write here


count method

We can find the number of occurrences of a certain element in a list by using the method count

[11]:

la = ['a', 'n', 'a', 'c', 'o', 'n', 'd', 'a']

[12]:

la.count('n')

[12]:

2

[13]:

la.count('a')

[13]:

3

[14]:

la.count('d')

[14]:

1


Do not abuse count

WARNING: count is often used in a wrong / inefficient ways

1. Could the list contain duplicates? Remember they will get counted!

2. Could the list contain no duplicate? Remember to also handle this case!

3. count performs a search on all the list, which could be inefficient: is it really needed, or do we already know the interval where to search?

QUESTION: Look at the following code fragments, and for each of them try guessing the result (or if it produces an error)

1. ['A','aa','a','aaAah',"a", "aaaa"[1], " a "].count("a")

2. ["the", "punishment", "of", "the","fools"].count('Fools') == 1

3. lst = ['oasis','date','oasis','coconut','date','coconut']
print(lst.count('date') == 1)

4. lst = ['oasis','date','oasis','coconut','date','coconut']
print(lst[4] == 'date')

5. ['2',2,"2",2,float("2"),2.0, 4/2,"1+1",int('3')-float('1')].count(2)

6. [].count([])

7. [[],[],[]].count([])


Exercise - country life

Given a list country, write some code which prints True if the first half contains a number of elements el1 equal to the number of elements el2 in the second half.

Show solution
[15]:

el1,el2 = 'shovels', 'hoes'          # True
#el1,el2 = 'shovels', 'shovels'      # False
#el1,el2 = 'wheelbarrows', 'plows'   # True
#el1,el2 = 'shovels', 'wheelbarrows' # False

country = ['plows','wheelbarrows', 'shovels',      'wheelbarrows', 'shovels','hoes', 'wheelbarrows',
'hoes', 'plows',        'wheelbarrows', 'plows',        'shovels','plows','hoes']

# write here


index method

The index method allows us to find the index of the FIRST occurrence of an element.

[16]:

#      0   1   2   3   4   5
la = ['p','a','e','s','e']

[17]:

la.index('p')

[17]:

0

[18]:

la.index('a')

[18]:

1

[19]:

la.index('e')  # we find the FIRST occurrence

[19]:

2


If the element we’re looking for is not present, we will get an error:

>>> la.index('z')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-303-32d9c064ebe0> in <module>
----> 1 la.index('z')

ValueError: 'z' is not in list


Optionally, you can specify an index to start from (included):

[20]:

# 0   1   2   3   4   5   6   7   8   9   10
['a','c','c','a','p','a','r','r','a','r','e'].index('a',6)

[20]:

8


And also where to end (excluded):

# 0   1   2   3   4   5   6   7   8   9   10
['a','c','c','a','p','a','r','r','a','r','e'].index('a',6,8)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-7f344c26b62e> in <module>
1 # 0   1   2   3   4   5   6   7   8   9   10
----> 2 ['a','c','c','a','p','a','r','r','a','r','e'].index('a',6,8)

ValueError: 'a' is not in list


Do not abuse index

WARNING: index is often used in a wrong / inefficient ways

1. Could the list contain duplicates? Remember only the first will be found!

2. Could the list not contain the searched element? Remember to also handle this case!

3. index performs a search on all the list, which could be inefficient: is it really needed, or do we already know the interval where to search?

4. If we want to know if an element is in a position we already know, index is useless, it’s enough to write my_list[3] == element. If you used index, it could discover duplicate characters which are before or after the one we are interested in!

QUESTION: Look at the following code fragments, and for each one try guessing the result it produces (or if it gives error).

1. ['arc','boat','hollow','dune'].index('hollow') == ['arc','boat','hollow','dune'].index('hollow',1)

2. ['azure','blue','sky blue','smurfs'][-1:].index('sky blue')

3. road = ['asphalt','bitumen','cement','gravel']

4. road = ['asphalt','bitumen','cement','gravel']

5. road = ['asphalt','bitumen','mortar','gravel']

6. la = [0,5,10]
la.reverse()
print(la.index(5) > la.index(10))


Exercise - Spatoč

In the past you met the Slavic painter Spatoč when he was still dirt poor. He gifted you with 2 or 3 paintings (you don’t remember) of dubious artistic value that you hid in the attic, but now watching TV you just noticed that Spatoč has gained international fame. You run to the attic to retrieve the paintings, which are lost among junk. Every painting is contained in a [ ] box, but you don’t know in which rack it is. Write some code which prints where they are.

• racks are numbered from 1. If the third painting was not found, print 0.

• DO NOT use loops nor if

• HINT: printing first two is easy - to print the last one have a look at Booleans - evaluation order

Example 1 - given:

[21]:

      #  1      2           3             4             5
attic = [3,    '\\',       ['painting'], '---',        ['painting'],
#  6      7           8             9             10
5.23, ['shovel'], ['ski'],      ["painting"], ['lamp']]


prints:

rack of first painting : 3
rack of second painting: 5
rack of third painting : 9


Example 2 - given:

[22]:

        # 1           2     3       4            5          6          7
attic = [['painting'],'--',['ski'],['painting'],['statue'],['shovel'],['boots']]


prints

rack of first painting : 1
rack of second painting: 4
rack of third painting : 0

Show solution
[23]:

      #  1 2     3           4      5           6     7          8       9             10
attic = [3,'\\',['painting'],'---',['painting'],5.23,['shovel'],['ski'],['painting'], ['lamp']]
#  3,5,9
# 1           2     3       4            5          6          7
#attic = [['painting'],'--',['ski'],['painting'],['statue'],['shovel'],['boots']]
#  1,4,0

# write here


remove method

remove takes an object as parameter, searches for the FIRST cell containing that object and eliminates it:

[24]:

#     0 1 2 3 4 5
la = [6,7,9,5,9,8]   # the 9 is in the first cell with index 2 and 4

[25]:

la.remove(9)   # searches first cell containing 9

[26]:

la

[26]:

[6, 7, 5, 9, 8]


As you can see, the cell which was at index 2 and that contained the FIRST occurrence of 9 has been eliminated. The cell containing the SECOND occurrence of 9 is still there.

If you try removing an object which is not present, you will receive an error:

la.remove(666)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-121-5d04a71f9d33> in <module>
----> 1 la.remove(666)

ValueError: list.remove(x): x not in list


Do not abuse remove

WARNING: remove is often used in a wrong / inefficient ways

1. Could the list contain duplicates? Remember only the first will be removed!

2. Could the list not contain the searched element? Remember to also handle this case!

3. remove performs a search on all the list, which could be inefficient: is it really needed, or do we already know the position i where the element to be removed is? In such case it’s much better using .pop(i)

QUESTION: Look at the following code fragments, and for each try guessing the result (or if it produces an error).

1. la = ['a','b','c','b']
la.remove('b')
print(la)

2. la = ['a','b','c','b']
x = la.remove('b')
print(x)
print(la)

3. la = ['a','d','c','d']
la.remove('b')
print(la)

4. la = ['a','bb','c','bbb']
la.remove('b')
print(la)

5. la = ['a','b','c','b']
la.remove('B')
print(la)

6. la = ['a',9,'99',9,'c',str(9),'999']
la.remove("9")
print(la)

7. la = ["don't", "trick","me"]
la.remove("don't").remove("trick").remove("me")
print(la)

8. la = ["don't", "trick","me"]
la.remove("don't")
la.remove("trick")
la.remove("me")
print(la)

9. la = [4,5,7,10]
11 in la or la.remove(11)
print(la)

10. la = [4,5,7,10]
11 in la and la.remove(11)
print(la)

11. la = [4,5,7,10]
5 in la and la.remove(5)
print(la)

12. la = [9, [9], [[9]], [[[9]]] ]
la.remove([9])
print(la)

13. la = [9, [9], [[9]], [[[9]]] ]
la.remove([[9]])
print(la)


Exercise - nob

Write some code which removes from list la all the numbers contained in the 3 elements list lb.

• your code must work with any list la and lb of three elements

• you can assume that list la contains exactly TWO occurrences of all the elements of lb (plus also other numbers)

Example - given:

lb = [8,7,4]
la = [7,8,11,8,7,4,5,4]


after your code it must result:

>>> print(la)
[11, 5]

Show solution
[27]:

lb = [8,7,4]
la = [7,8,11,8,7,4,5,4]

# write here


Continue

Go on with first challenges

[ ]: