Phone calls
Download worked project
A radio station gathered calls from listeners, recording just the name of the caller and the phone number, as seen on the phone display. For marketing purposes, the station owner now wants to better understand the locations from where listeners where calling. He then hires you as Algorithmic Market Strategist and asks you to show statistics about the provinces of the calling sites. There is a problem, though. Numbers where written down by hand and sometimes they are not uniform, so it would be better to find a canonical representation.
NOTE: Phone prefixes can be a very tricky subject, if you are ever to deal with them seriously please use proper phone number parsing libraries and do read Falsehoods Programmers Believe About Phone Numbers
What to do
Unzip exercises zip in a folder, you should obtain something like this:
phone-calls-prj
phone-calls.ipynb
phone-calls-sol.ipynb
jupman.py
WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then a browser. The browser should show a file list: navigate the list and open the notebook
phone-calls.ipynb
Go on reading the notebook, and write in the appropriate cells when asked.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
1. canonical
✪ We first want to canonicalize a phone number as a string.
We suppose a canonical phone number:
contains no spaces
contains no international prefix, so no
+39
nor0039
: we assume all calls where placed from Italy (even if they have international prefix)
For example, all of these are canonicalized to “0461123456”:
+39 0461 123456
+390461123456
0039 0461 123456
00390461123456
These are canonicalized as the following:
328 123 4567 -> 3281234567
0039 328 123 4567 -> 3281234567
0039 3771 1234567 -> 37711234567
REMEMBER: strings are immutable !!!!!
Show solution[1]:
def canonical(phone):
""" RETURN the canonical version of phone as a string.
"""
raise Exception('TODO IMPLEMENT ME !')
assert canonical('+39 0461 123456') == '0461123456'
assert canonical('+390461123456') == '0461123456'
assert canonical('0039 0461 123456') == '0461123456'
assert canonical('00390461123456') == '0461123456'
assert canonical('003902123456') == '02123456'
assert canonical('003902120039') == '02120039'
assert canonical('0039021239') == '021239'
2. prefix
✪✪ We now want to extract the province prefix - the ones we consider as valid are in province_prefixes
list. Note some numbers are from mobile operators and you can distinguish them by prefixes like 328
- the ones we consider are in an mobile_prefixes
list. Write a function that given a phone number RETURN the prefix of the phone as a string.
Remember first to make it canonical !!
If phone is mobile, RETURN string
'mobile'
. If it is not a phone nor a mobile, RETURN the string'unrecognized'
To determine if the phone is mobile or from province, use provided
province_prefixes
andmobile_prefixes
listsUSE the already defined function
canonical(phone)
[2]:
province_prefixes = ['0461', '02', '011']
mobile_prefixes = ['330', '340', '328', '390', '3771']
def prefix(phone):
raise Exception('TODO IMPLEMENT ME !')
assert prefix('0461123') == '0461'
assert prefix('+39 0461 4321') == '0461'
assert prefix('0039011 432434') == '011'
assert prefix('328 432434') == 'mobile'
assert prefix('+39340 432434') == 'mobile'
assert prefix('00666011 432434') == 'unrecognized'
assert prefix('12345') == 'unrecognized'
assert prefix('+39 123 12345') == 'unrecognized'
3. hist
✪✪✪ Write a function that given a list of non-canonical phones, RETURN a dictionary where the keys are the prefixes of the canonical phones and the values are the frequencies of the prefixes (keys may also be unrecognized
or mobile
)
NOTE Numbers corresponding to the same phone (so which have the same canonicalrepresentation) must be counted ONLY ONCE!
USE the already defined functions canonical(phone)
AND prefix(phone)
[3]:
province_prefixes = ['0461', '02', '011']
mobile_prefixes = ['330', '340', '328', '390', '3771']
def hist(phones):
raise Exception('TODO IMPLEMENT ME !')
assert hist(['0461123']) == {'0461':1}
assert hist(['123']) == {'unrecognized':1}
assert hist(['328 123']) == {'mobile':1}
assert hist(['0461123','+390461123']) == {'0461':1} # same canonicals, should be counted only once
assert hist(['0461123', '+39 0461 4321']) == {'0461':2}
assert hist(['0461123', '+39 0461 4321', '0039011 432434']) == {'0461':2, '011':1}
assert hist(['+39 02 423', '0461123', '02 426', '+39 0461 4321', '0039328 1234567', '02 423', '02 424']) == {'0461':2, 'mobile':1, '02':3}
4. display calls
✪✪ Using matplotlib, display a bar plot of the frequency of calls by prefixes (including mobile and unrecognized), sorting them in reverse order so you first see the province with the higher number of calls. Also, save the plot on disk with plt.savefig('prefixes-count.png')
(call it before plt.show()
)
If you’re in trouble you can find plenty of examples in the visualization chapter
You should obtain something like this:
Show solution[4]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
province_prefixes = ['0461', '02', '011']
mobile_prefixes = ['330', '340', '328', '390', '3771']
phones = ['+39 02 423', '0461123', '02 426', '+39 0461 4321', '0039328 1234567', '02 423', '02 424']
# write here
[ ]: