Town events

Download worked project

Browse files online

We will work on a dataset of events which occurred in the Municipality of Trento (Italy) during years 2019-20. Each event can be held during a particular day, two days, or many specified as a range. Events are written using natural language, so we will try to extract such dates, taking into account that information sometimes can be partial or absent.

Data source: Municipality of Trento, released under Creative Commons Attribution 4.0 licence.

What to do

  1. Unzip exercises zip in a folder, you should obtain something like this:

town-events-prj
    town-events.ipynb
    town-events-sol.ipynb
    eventi.csv
    jupman.py

WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !

  1. open Jupyter Notebook from that folder. Two things should open, first a console and then a browser. The browser should show a file list: navigate the list and open the notebook town-events.ipynb

  2. Go on reading the notebook, and write in the appropriate cells when asked

Shortcut keys:

  • to execute Python code inside a Jupyter cell, press Control + Enter

  • to execute Python code inside a Jupyter cell AND select next cell, press Shift + Enter

  • to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press Alt + Enter

  • If the notebooks look stuck, try to select Kernel -> Restart

The dataset

Let’s have a look of the dataset eventi.csv, note we used pandas to show some data but it’s not actually necessary to solve the exercises.

[1]:
import pandas as pd
import numpy as np

eventi = pd.read_csv('eventi.csv', encoding='UTF-8') # remember the encoding !
eventi.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 253 entries, 0 to 252
Data columns (total 35 columns):
 #   Column                       Non-Null Count  Dtype
---  ------                       --------------  -----
 0   remoteId                     253 non-null    object
 1   published                    253 non-null    object
 2   modified                     253 non-null    object
 3   Priorità                     253 non-null    int64
 4   Evento speciale              0 non-null      float64
 5   Titolo                       253 non-null    object
 6   Titolo breve                 1 non-null      object
 7   Sottotitolo                  227 non-null    object
 8   Descrizione                  224 non-null    object
 9   Locandina                    16 non-null     object
 10  Inizio                       253 non-null    object
 11  Termine                      252 non-null    object
 12  Quando                       253 non-null    object
 13  Orario                       251 non-null    object
 14  Durata                       6 non-null      object
 15  Dove                         252 non-null    object
 16  lat                          253 non-null    float64
 17  lon                          253 non-null    float64
 18  address                      241 non-null    object
 19  Pagina web                   201 non-null    object
 20  Contatto email               196 non-null    object
 21  Contatto telefonico          196 non-null    object
 22  Informazioni                 62 non-null     object
 23  Costi                        132 non-null    object
 24  Immagine                     252 non-null    object
 25  Evento - manifestazione      252 non-null    object
 26  Manifestazione cui fa parte  108 non-null    object
 27  Tipologia                    252 non-null    object
 28  Materia                      252 non-null    object
 29  Destinatari                  24 non-null     object
 30  Circoscrizione               109 non-null    object
 31  Struttura ospitante          220 non-null    object
 32  Associazione                 1 non-null      object
 33  Ente organizzatore           0 non-null      float64
 34  Identificativo               0 non-null      float64
dtypes: float64(5), int64(1), object(29)
memory usage: 69.3+ KB

We will focus on Quando (When) column:

[2]:
eventi['Quando']
[2]:
0      venerdì 5 aprile alle 20:30 in via degli Olmi ...
1                                Giovedì 7 novembre 2019
2                               Giovedì 14 novembre 2019
3                               Giovedì 21 novembre 2019
4                               Giovedì 28 novembre 2019
                             ...
248                               sabato 9 novembre 2019
249             da venerdì 8 a domenica 10 novembre 2019
250                              giovedì 7 novembre 2019
251                             giovedì 28 novembre 2019
252                             giovedì 21 novembre 2019
Name: Quando, Length: 253, dtype: object

1. leap_year

✪ A leap year has 366 days instead of regular 365. Yor are given some criteria to detect whether or not a year is a leap year. Implement them in a function which given a year as a number RETURN True if it is a leap year, False otherwise.

IMPORTANT: in Python there are predefined methods to detect leap years, but here you MUST write your own code!

  1. If the year is evenly divisible by 4, go to step 2. Otherwise, go to step 5.

  2. If the year is evenly divisible by 100, go to step 3. Otherwise, go to step 4.

  3. If the year is evenly divisible by 400, go to step 4. Otherwise, go to step 5.

  4. The year is a leap year (it has 366 days)

  5. The year is not a leap year (it has 365 days)

(if you’re curios about calendars, see this link)

Show solution
[3]:
def is_leap(year):
    raise Exception('TODO IMPLEMENT ME !')


assert is_leap(4)    == True
assert is_leap(104)  == True
assert is_leap(204)  == True
assert is_leap(400)  == True
assert is_leap(1600) == True
assert is_leap(2000) == True
assert is_leap(2400) == True
assert is_leap(2000) == True
assert is_leap(2004) == True
assert is_leap(2008) == True
assert is_leap(2012) == True

assert is_leap(1)    == False
assert is_leap(5)    == False
assert is_leap(100)  == False
assert is_leap(200)  == False
assert is_leap(1700) == False
assert is_leap(1800) == False
assert is_leap(1900) == False
assert is_leap(2100) == False
assert is_leap(2200) == False
assert is_leap(2300) == False
assert is_leap(2500) == False
assert is_leap(2600) == False

2. full_date

WARNING: avoid constants in function bodies !!

In the exercises data you will find many names and connectives such as 'Giovedì', 'Novembre', 'e', 'a', etc. DO NOT put such constant names inside your code and use instead the provided lists (DAYS, MONTHS…) !! You have to write generic code which works with any input.

✪✪ Write function full_date which takes some natural language text representing a complete date and outputs a string in the format yyyy-mm-dd like 2019-03-25.

  • Dates will be expressed in Italian, so we report here the corresponding translations

  • your function should work regardless of capitalization of input

  • we assume the date to be always well formed

Examples:

At the begininning you always have day name (Mercoledì means Wednesday):

>>> full_date("Mercoledì 13 Novembre 2019")
"2019-11-13"

Right after day name, you may also find a day phase, like mattina for morning:

>>> full_date("Mercoledì mattina 13 Novembre 2019")
"2019-11-13"

Remember you can have lowercases and single digits which must be prepended by zero:

>>> full_date("domenica 4 dicembre 1923")
"1923-12-04"

For more examples, see assertions.

Show solution
[4]:
DAYS = ['lunedì', 'martedì', 'mercoledì', 'giovedì', 'venerdì', 'sabato', 'domenica']

MONTHS = ['gennaio', 'febbraio', 'marzo'    , 'aprile' , 'maggio'  , 'giugno',
          'luglio' , 'agosto'  , 'settembre', 'ottobre', 'novembre', 'dicembre' ]

#             morning,   afternoon,   evening, night
DAY_PHASES = ['mattina', 'pomeriggio', 'sera', 'notte']


def full_date(text):
    raise Exception('TODO IMPLEMENT ME !')

assert full_date("Giovedì 14 novembre 2019") == "2019-11-14"
assert full_date("Giovedì 7 novembre 2019") == "2019-11-07"
assert full_date("Giovedì pomeriggio 14 novembre 2019") == "2019-11-14"
assert full_date("sabato mattina 25 marzo 2017") == "2017-03-25"
assert full_date("Mercoledì 13 Novembre 2019") == "2019-11-13"
assert full_date("domenica 4 dicembre 1923") == "1923-12-04"

3. partial_date

✪✪✪ Write a function partial_date which takes a natural language text representing one or more dates, and RETURN only the FIRST date found, in the format yyyy-mm-dd. If the FIRST date contains insufficient information to form a complete date, in the returned date leave the characters 'yyyy' for unknown year, 'mm' for unknown months and 'dd' for unknown day.

NOTE: Here we only care about FIRST date, DO NOT attempt to fetch eventual missing information from the second date, we will deal will that in a later exercise.

Examples:

>>> partial_date("Giovedì 7 novembre 2019")
"2019-11-07"

>>> partial_date("venerdì 15 novembre")
"yyyy-11-15"

>>> partial_date("venerdì pomeriggio 15 e sabato mattina 16 novembre 2019")
"yyyy-mm-15"

For more examples, see asserts.

Show solution
[5]:
CONNECTIVE_AND = 'e'

CONNECTIVE_FROM = 'da'
CONNECTIVE_TO = 'a'

DAYS = ['lunedì', 'martedì', 'mercoledì', 'giovedì', 'venerdì', 'sabato', 'domenica']
MONTHS = ['gennaio', 'febbraio', 'marzo'    , 'aprile' , 'maggio'  , 'giugno',
          'luglio' , 'agosto'  , 'settembre', 'ottobre', 'novembre', 'dicembre' ]

             # morning,   afternoon,   evening, night
DAY_PHASES = ['mattina', 'pomeriggio', 'sera', 'notte']

def partial_date(text):
    raise Exception('TODO IMPLEMENT ME !')

# complete, uppercase day
assert partial_date("Giovedì 7 novembre 2019") == "2019-11-07"
assert partial_date("Giovedì 14 novembre 2019") == "2019-11-14"
# lowercase day
assert partial_date("mercoledì 13 novembre 2019") == "2019-11-13"
# lowercase, dayphase, missing month and year
assert partial_date("venerdì pomeriggio 15") == "yyyy-mm-15"
# single day, lowercase, no year
assert partial_date("venerdì 15 novembre") == "yyyy-11-15"

# no year,   hour / location to be discarded
assert partial_date("venerdì 5 aprile alle 20:30 in via degli Olmi 26 (Trento sud)")\
                    == "yyyy-04-05"

# two dates, 'and' connective ('e'), day phase morning/afternoon ('mattina'/'pomeriggio')
assert partial_date("venerdì pomeriggio 15 e sabato mattina 16 novembre 2019") \
                    == "yyyy-mm-15"

# two dates, begins with connective 'Da'
assert partial_date("Da lunedì 25 novembre a domenica 01 dicembre 2019") == "yyyy-11-25"
assert partial_date("da giovedì 12 a domenica 15 dicembre 2019") == "yyyy-mm-12"
assert partial_date("da giovedì 9 a domenica 12 gennaio 2020") == "yyyy-mm-09"
assert partial_date("Da lunedì 04 a domenica 10 novembre 2019") == "yyyy-mm-04"

4. parse_dates_and

✪✪✪ Write a function which, given a string representing two possibly partial dates separated by the e connective (and), RETURN a tuple holding the two extracted dates each in the format yyyy-mm-dd.

  • IMPORTANT: Notice that the year or month of the first date might actually be indicated in the second date ! In this exercise we want missing information in the first date to be filled in with year and/or month taken from second date.

  • HINT: implement this function calling previously defined functions. If you do so, it will be fairly easy.

Examples:

>>> parse_dates_and("venerdì pomeriggio 15 e sabato mattina 16 novembre 2019")
("2019-11-15", "2019-11-16")

>>> parse_dates_and("lunedì 4 e domenica 10 novembre")
("yyyy-11-04","yyyy-11-10")

For more examples, see asserts.

Show solution
[6]:

def parse_dates_and(text):
    raise Exception('TODO IMPLEMENT ME !')


# complete dates
assert parse_dates_and("lunedì 25 aprile 2018 e domenica 01 dicembre 2019") == ("2018-04-25","2019-12-01")

# exactly two dates, day phase morning/afternoon ('mattina'/'pomeriggio')
assert parse_dates_and("venerdì pomeriggio 15 e sabato mattina 16 novembre 2019") == ("2019-11-15", "2019-11-16")

# first date missing year
assert parse_dates_and("lunedì 13 settembre e sabato 25 dicembre 2019") == ("2019-09-13","2019-12-25")

# first date missing month and year
assert parse_dates_and("Giovedì 12 e domenica 15 dicembre 2019") == ("2019-12-12","2019-12-15")

assert parse_dates_and("giovedì 9 e domenica 12 gennaio 2020") == ("2020-01-09", "2020-01-12")

assert parse_dates_and("lunedì 4 e domenica 10 novembre 2019") == ("2019-11-04","2019-11-10")

# first missing month and year, second missing year
assert parse_dates_and("lunedì 4 e domenica 10 novembre") == ("yyyy-11-04","yyyy-11-10")

# first missing month and year, second missing month and year
assert parse_dates_and("lunedì 4 e domenica 10") == ("yyyy-mm-04","yyyy-mm-10")