In this worked project we will visualize intercity bus network in GTFS format. Original data was split in several files which we merged into dataset network-short.csv.

Data source: dati.trentino.it, MITT service, released under Creative Commons Attribution 4.0 licence.

REQUIREMENTS: Having read Relational data tutorial , which contains also instructions for installing required libraries.

expected-network preview

What to do

  1. Unzip exercises zip in a folder, you should obtain something like this:


WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !

  1. open Jupyter Notebook from that folder. Two things should open, first a console and then a browser. The browser should show a file list: navigate the list and open the notebook bus-network.ipynb

  2. Go on reading the notebook, and write in the appropriate cells when asked

Shortcut keys:

  • to execute Python code inside a Jupyter cell, press Control + Enter

  • to execute Python code inside a Jupyter cell AND select next cell, press Shift + Enter

  • to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press Alt + Enter

  • If the notebooks look stuck, try to select Kernel -> Restart


To visualize data, we will use networkx library. Let’s first see an example on how to do it:

import networkx as nx
from soft import draw_nx

Gex = nx.DiGraph()

# we can force horizontal layout like this:

Gex.graph['graph']= {

# When we add nodes, we can identify them with an identifier like the
# stop_id which is separate from the label, because in some unfortunate
# case two different stops can share the same label.

Gex.add_node('1', label='Trento-Autostaz.',
                  color='black', fontcolor='black')
Gex.add_node('723', label='Trento-Via Brescia 4',
                    color='black', fontcolor='black')
Gex.add_node('870', label='Sarch Centro comm.',
                    color='black', fontcolor='black')
Gex.add_node('1180', label='Trento Corso 3 Novembre',
                     color='black', fontcolor='black')

# IMPORTANT: edges connect stop_ids ,  NOT labels !!!!

# function defined in sciprog.py :

Colors and additional attributes

Since we have a bus stop netowrk, we might want to draw edges according to the route they represent. Here we show how to do it only with the edge from Trento-Autostaz to Trento Corso 3 Novembre:

# we can retrieve an edge like this:

edge = Gex['1']['1180']

# and set attributes, like these:

edge['weight'] = 5                # it takes 5 minutes to go from Trento-Autostaz
                                  # to Trento Corso 3 Novembre
edge['label'] = str(5)            # the label is a string

edge['color'] = '#2ca02c'         # we can set some style for the edge, such as color
edge['penwidth']= 4               # and thickness

edge['route_short_name'] = 'B301' # we can add any attribute we want,
                                  # Note these custom ones won't show in the graph


To be more explicit, we can also add a legend this way:

draw_nx(Gex, [{'color': '#2ca02c', 'label': 'B211'}])
# Note an edge is a simple dictionary:
{'weight': 5, 'label': '5', 'color': '#2ca02c', 'penwidth': 4, 'route_short_name': 'B301'}


To load network-short.csv, we provide this function:

def load_stops():
    """Loads file data and RETURN a list of dictionaries with the stop times

    import csv
    with open('network-short.csv', newline='', encoding='UTF-8') as csvfile:
        reader = csv.DictReader(csvfile)
        lst = []
        for d in reader:
    return lst

stops = load_stops()


[OrderedDict([('', '3'),
              ('route_id', '76'),
              ('agency_id', '12'),
              ('route_short_name', 'B202'),
              ('route_type', '3'),
              ('service_id', '22018091220190621'),
              ('trip_id', '0002402742018091220190621'),
              ('trip_headsign', 'Trento-Autostaz.'),
              ('direction_id', '0'),
              ('arrival_time', '06:27:00'),
              ('departure_time', '06:27:00'),
              ('stop_id', '5025'),
              ('stop_sequence', '4'),
              ('stop_code', '2620VE'),
              ('stop_name', 'Sardagna Civ.20'),
              ('stop_desc', ''),
              ('stop_lat', '46.073125'),
              ('stop_lon', '11.093579'),
              ('zone_id', '2620.0')]),
 OrderedDict([('', '4'),
              ('route_id', '76'),
              ('agency_id', '12'),
              ('route_short_name', 'B202'),
              ('route_type', '3'),
              ('service_id', '22018091220190621'),
              ('trip_id', '0002402742018091220190621'),
              ('trip_headsign', 'Trento-Autostaz.'),
              ('direction_id', '0'),
              ('arrival_time', '06:28:00'),
              ('departure_time', '06:28:00'),
              ('stop_id', '843'),
              ('stop_sequence', '5'),
              ('stop_code', '2620MS'),
              ('stop_name', 'Sardagna-Maso Scala'),
              ('stop_desc', ''),
              ('stop_lat', '46.069871'),
              ('stop_lon', '11.097749'),
              ('zone_id', '2620.0')])]

1. extract_routes

Implement a function that extracts all route_short_name from the stops list and RETURNs an alphabetically sorted list of them, without duplicates (see example)


>>> stops = load_stops()
>>> extract_routes(stops)
['B201', 'B202', 'B211', 'B217', 'B301']
import networkx as nx from soft import draw_nx def extract_routes(stps): raise Exception('TODO IMPLEMENT ME !') extract_routes(stops)

2. to_int_min

Implement a function that takes a time string in the format like 08:27:42 and RETURN the time since midnight in minutes, ignoring the seconds (es 507)

def to_int_min(time_string): raise Exception('TODO IMPLEMENT ME !') to_int_min('08:27:42')

3. get_legend_edges

If you have n routes numbered from 0 to n-1, and you want to assign to each of them a different color, we provide this function:

def get_color(i, n):
    """ RETURN the i-th color chosen from n possible colors, in
        hex format (i.e. #ff0018).

        - if i < 0 or i >= n, raise ValueError
    if n < 1:
        raise ValueError("Invalid n: %s" % n)
    if i < 0 or i >= n:
        raise ValueError("Invalid i: %s" % i)

    #HACKY, just for matplotlib < 3
    lst = ['#1f77b4',

    return lst[i % 10]


Now implement a function that RETURNs a list of dictionaries, where each dictionary represent a route with label and associated color. Dictionaries are in the order returned by extract_routes() function.


>>> get_legend_edges()
[{'label': 'B201', 'color': '#1f77b4'},
 {'label': 'B202', 'color': '#ff7f0e'},
 {'label': 'B211', 'color': '#2ca02c'},
 {'label': 'B217', 'color': '#d62728'},
 {'label': 'B301', 'color': '#9467bd'}]
def get_legend_edges(): raise Exception('TODO IMPLEMENT ME !') get_legend_edges()

4. calc_nx

Implement function calc_nx which RETURN a NetworkX DiGraph representing the bus stop network

  • To keep things simple, we suppose routes NEVER overlap (no edge is ever shared by two routes), so we need only a DiGraph and not a MultiGraph

  • as label for nodes, use the stop_name, and try to format it nicely.

  • as 'weight' for the edges, use the time in minutes between one stop and the next one

  • as custom property, add route_short_name

  • as 'color' for the edges, use the color given by provided get_color(i,n) function

  • as 'penwidth' for edges, set 4

IMPORTANT: notice stops are already ordered by arrival_time, this makes it easy to find edges !

HINT: to make sure you’re on the right track, try first to represent one single route, like B202


def calc_nx(stops): raise Exception('TODO IMPLEMENT ME !') G = calc_nx(stops) draw_nx(G, get_legend_edges(), )

5. Hubs

A hub is a node that allows to switch route, that is, it is touched by at least two different routes.

For example, Trento-Autostaz is touched by three routes, which is more than one, so it is a hub. Let’s examine the node - we know it has stop_id='1':

{'label': 'Trento\nAutostaz.', 'color': 'black', 'fontcolor': 'black'}

If we examine its in_edges, we find it has incoming edges from stop_id '723' and '870', which represent respectively Trento Via Brescia and Sarche Centro Commerciale :

InEdgeDataView([('870', '1'), ('723', '1')])

If you get a View object, if needed you can easily transform to a list:

[('870', '1'), ('723', '1')]
{'label': 'Trento\nVia\nBrescia\n4', 'color': 'black', 'fontcolor': 'black'}
{'label': 'Sarche\nCentro\nComm.', 'color': 'black', 'fontcolor': 'black'}

There is only an outgoing edge toward Trento Corso 3 Novembre :

OutEdgeDataView([('1', '1108')])
{'label': 'Trento\nC.So\nTre\nNovembre',
 'color': 'black',
 'fontcolor': 'black'}

If, for example, we want to know the route_id of this outgoing edge, we can access it this way:

{'weight': 5,
 'label': '5',
 'route_short_name': 'B301',
 'color': '#9467bd',
 'penwidth': 4}

If you want to change the color attribute of the node '1', you can write like this:

G.node['1']['color'] = 'red'
G.node['1']['fontcolor'] = 'red'

Implement color_hubs

Implement a function which prints the hubs in the graph G as text, and then draws the graph with the hubs colored in red.

NOTE: you don’t need to recalculate the graph, just set the relevant nodes color to red


>>> color_hubs(G)
SOLUTION: The hubs are:





def color_hubs(G): raise Exception('TODO IMPLEMENT ME !') color_hubs(G)

6. plot_timings

To extract bus times from G, use this:

OutEdgeView([('757', '746'), ('746', '857'), ('857', '742'), ('742', '870'), ('870', '1'), ('1', '1108'), ('5025', '843'), ('843', '842'), ('842', '3974'), ('3974', '841'), ('841', '881'), ('881', '723'), ('723', '1'), ('1556', '4392'), ('4392', '4391'), ('4391', '4390'), ('4390', '742'), ('829', '3213'), ('3213', '757'), ('1108', '1109')])

If you get a View, you can iterate through the sequence like it were a list

To get the data from an edge, you can use this:

{'weight': 5,
 'label': '5',
 'route_short_name': 'B301',
 'color': '#9467bd',
 'penwidth': 4}

Now implement the function plot_timings, which given a networkx DiGraph G plots a frequency histogram of the time between bus stops.

Expected output:


def plot_timings(G): raise Exception('TODO IMPLEMENT ME !') plot_timings(G)