Visualization solutions

Introduction

We will review the famous library Matplotlib which allows to display a variety of charts, and it is the base of many other visualization libraries.

What to do

  • unzip exercises in a folder, you should get something like this:

visualization
    visualization.ipynb
    visualization-sol.ipynb
    jupman.py
    soft.py

WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !

  • open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook visualization/visualization.ipynb

WARNING 2: DO NOT use the Upload button in Jupyter, instead navigate in Jupyter browser to the unzipped folder !

  • Go on reading that notebook, and follow instuctions inside.

Shortcut keys:

  • to execute Python code inside a Jupyter cell, press Control + Enter

  • to execute Python code inside a Jupyter cell AND select next cell, press Shift + Enter

  • to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press Alt + Enter

  • If the notebooks look stuck, try to select Kernel -> Restart

First example

Let’s start with a very simple plot:

[2]:
# this is *not* a python command, it is a Jupyter-specific magic command,
# to tell jupyter we want the graphs displayed in the cell outputs
%matplotlib inline

# imports matplotlib
import matplotlib.pyplot as plt

# we can give coordinates as simple numberlists
# this are couples for the function y = 2 * x
xs = [1, 2, 3, 4, 5, 6]
ys = [2, 4, 6, 8,10,12]

plt.plot(xs, ys)

# we can add this after plot call, it doesn't matter
plt.title("my function")
plt.xlabel('x')
plt.ylabel('y')

# prevents showing '<matplotlib.text.Text at 0x7fbcf3c4ff28>' in Jupyter
plt.show()
../_images/visualization_visualization-sol_3_0.png

Plot style

To change the way the line is displayed, you can set dot styles with another string parameter. For example, to display red dots, you would add the string ro, where r stands for red and o stands for dot.

[3]:
%matplotlib inline
import matplotlib.pyplot as plt

xs = [1, 2, 3, 4, 5, 6]
ys = [2, 4, 6, 8,10,12]

plt.plot(xs, ys, 'ro')  # NOW USING RED DOTS

plt.title("my function")
plt.xlabel('x')
plt.ylabel('y')

plt.show()
../_images/visualization_visualization-sol_5_0.png

x power 2 exercise

Try to display the function y = x**2 (x power 2) using green dots and for integer xs going from -10 to 10

[4]:
# write here the solution


Show solution
[5]:

../_images/visualization_visualization-sol_11_0.png

Axis limits

If you want to change the x axis, you can use plt.xlim:

[6]:
%matplotlib inline
import matplotlib.pyplot as plt

xs = [1, 2, 3, 4, 5, 6]
ys = [2, 4, 6, 8,10,12]

plt.plot(xs, ys, 'ro')

plt.title("my function")
plt.xlabel('x')
plt.ylabel('y')

plt.xlim(-5, 10)  # SETS LOWER X DISPLAY TO -5 AND UPPER TO 10
plt.ylim(-7, 26)  # SETS LOWER Y DISPLAY TO -7 AND UPPER TO 26

plt.show()
../_images/visualization_visualization-sol_13_0.png

Axis size

[7]:
%matplotlib inline
import matplotlib.pyplot as plt

xs = [1, 2, 3, 4, 5, 6]
ys = [2, 4, 6, 8,10,12]

fig = plt.figure(figsize=(10,3))  # width: 10 inches, height 3 inches

plt.plot(xs, ys, 'ro')

plt.title("my function")
plt.xlabel('x')
plt.ylabel('y')


plt.show()

../_images/visualization_visualization-sol_15_0.png

Changing tick labels

You can also change labels displayed on ticks on axis with plt.xticks and plt.yticks functions:

Note: instead of xticks you might directly use categorical variables IF you have matplotlib >= 2.1.0

Here we use xticks as sometimes you might need to fiddle with them anyway

[8]:
%matplotlib inline
import matplotlib.pyplot as plt

xs = [1, 2, 3, 4, 5, 6]
ys = [2, 4, 6, 8,10,12]

plt.plot(xs, ys, 'ro')

plt.title("my function")
plt.xlabel('x')
plt.ylabel('y')

# FIRST NEEDS A SEQUENCE WITH THE POSITIONS, THEN A SEQUENCE OF SAME LENGTH WITH LABELS
plt.xticks(xs, ['a', 'b', 'c', 'd', 'e', 'f'])
plt.show()
../_images/visualization_visualization-sol_17_0.png

Introducting numpy

For functions involving reals, vanilla python starts showing its limits and its better to switch to numpy library. Matplotlib can easily handle both vanilla python sequences like lists and numpy array. Let’s see an example without numpy and one with it.

Example without numpy

If we only use vanilla Python (that is, Python without extra libraries like numpy), to display the function y = 2x + 1 we can come up with a solution like this

[9]:

%matplotlib inline
import matplotlib.pyplot as plt

xs = [x*0.1 for x in range(10)]   # notice we can't do a range with float increments
                                  # (and it would also introduce rounding errors)
ys = [(x * 2) + 1 for x in xs]

plt.plot(xs, ys, 'bo')

plt.title("y = 2x + 1  with vanilla python")
plt.xlabel('x')
plt.ylabel('y')

plt.show()
../_images/visualization_visualization-sol_21_0.png

Example with numpy

With numpy, we have at our disposal several new methods for dealing with arrays.

First we can generate an interval of values with one of these methods.

Sine Python range does not allow float increments, we can use np.arange:

[10]:
import numpy as np

xs = np.arange(0,1.0,0.1)
xs
[10]:
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

Equivalently, we could use np.linspace:

[11]:
xs = np.linspace(0,0.9,10)

xs
[11]:
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

Numpy allows us to easily write functions on arrays in a natural manner. For example, to calculate ys we can now do like this:

[12]:
ys = 2*xs + 1

ys
[12]:
array([1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8])

Let’s put everything together:

[13]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

xs = np.linspace(0,0.9,10)  # left end: 0 *included*  right end: 0.9  *included*   number of values: 10
ys = 2*xs + 1

plt.plot(xs, ys, 'bo')

plt.title("y = 2x + 1  with numpy")
plt.xlabel('x')
plt.ylabel('y')

plt.show()
../_images/visualization_visualization-sol_29_0.png

y = sin(x) + 3 exercise

✪✪✪ Try to display the function y = sin(x) + 3 for x at pi/4 intervals, starting from 0. Use exactly 8 ticks.

NOTE: 8 is the number of x ticks (telecom people would use the term ‘samples’), NOT the x of the last tick !!

  1. try to solve it without using numpy. For pi, use constant math.pi (first you need to import math module)

  2. try to solve it with numpy. For pi, use constant np.pi (which is exactly the same as math.pi)

b.1) solve it with np.arange

b.2) solve it with np.linspace

  1. For each tick, use the label sequence "0π/4", "1π/4" , "2π/4",  "3π/4" ,  "4π/4", "5π/4",   .... . Obviously writing them by hand is easy, try instead to devise a method that works for any number of ticks. What is changing in the sequence? What is constant? What is the type of the part changes ? What is final type of the labels you want to obtain ?

  2. If you are in the mood, try to display them better like 0, π/4 , π/2 π, 3π/4 , π, 5π/4 possibly using Latex (requires some search, this example might be a starting point)

NOTE: Latex often involves the usage of the \ bar, like in \frac{2,3}. If we use it directly, Python will interpret \f as a special character and will not send to the Latex processor the string we meant:

[14]:
'\frac{2,3}'
[14]:
'\x0crac{2,3}'

One solution would be to double the slashes, like this:

[15]:
'\\frac{2,3}'
[15]:
'\\frac{2,3}'

An even better one is to prepend the string with the r character, which allows to write slashes only once:

[16]:
r'\frac{2,3}'
[16]:
'\\frac{2,3}'
[17]:
# write here solution for a) y = sin(x) + 3 with vanilla python


Show solution
[18]:

../_images/visualization_visualization-sol_40_0.png
[19]:
# write here solution b.1)      y = sin(x) + 3 with numpy, arange
Show solution
[20]:

../_images/visualization_visualization-sol_45_0.png
[21]:
# write here solution b.2)      y = sin(x) + 3 with numpy, linspace


Show solution
[22]:

../_images/visualization_visualization-sol_50_0.png
Show solution
[23]:
# write here solution c)        y = sin(x) + 3 with numpy and pi xlabels


Show solution
[24]:

../_images/visualization_visualization-sol_58_0.png

Showing degrees per node

Going back to the indegrees and outdegrees as seen in Graph formats - Simple statistics paragraph, we will try to study the distributions visually.

Let’s take an example networkx DiGraph:

[25]:
import networkx as nx

G1=nx.DiGraph({
    'a':['b','c'],
    'b':['b','c', 'd'],
    'c':['a','b','d'],
    'd':['b', 'd']
})

draw_nx(G1)
../_images/visualization_visualization-sol_61_0.png

indegree per node

✪✪ Display a plot for graph G where the xtick labels are the nodes, and the y is the indegree of those nodes.

Note: instead of xticks you might directly use categorical variables IF you have matplotlib >= 2.1.0

Here we use xticks as sometimes you might need to fiddle with them anyway

To get the nodes, you can use the G1.nodes() function:

[26]:
G1.nodes()
[26]:
NodeView(('d', 'a', 'b', 'c'))

It gives back a NodeView which is not a list, but still you can iterate through it with a for in cycle:

[27]:
for n in G1.nodes():
    print(n)
d
a
b
c

Also, you can get the indegree of a node with

[28]:
G1.in_degree('b')
[28]:
4
Show solution
[29]:
# write here the solution


Show solution
[30]:

../_images/visualization_visualization-sol_75_0.png

Bar plots

The previous plot with dots doesn’t look so good - we might try to use instead a bar plot. First look at this this example, then proceed with the next exercise

[31]:
import numpy as np
import matplotlib.pyplot as plt

xs = [1,2,3,4]
ys = [7,5,8,2 ]

plt.bar(xs, ys,
        0.5,             # the width of the bars
        color='green',   # someone suggested the default blue color is depressing, so let's put green
        align='center')  # bars are centered on the xtick

plt.show()
../_images/visualization_visualization-sol_77_0.png

indegree per node bar plot

✪✪ Display a bar plot for graph G1 where the xtick labels are the nodes, and the y is the indegree of those nodes.

Show solution
[32]:
# write here


Show solution
[33]:

../_images/visualization_visualization-sol_86_0.png

indegree per node sorted alphabetically

✪✪ Display the same bar plot as before, but now sort nodes alphabetically.

NOTE: you cannot run .sort() method on the result given by G1.nodes(), because nodes in network by default have no inherent order. To use .sort() you need first to convert the result to a list object.

Show solution
[34]:

../_images/visualization_visualization-sol_91_0.png
Show solution
[35]:
# write here


indegree per node sorted

✪✪✪ Display the same bar plot as before, but now sort nodes according to their indegree. This is more challenging, to do it you need to use some sort trick. First read the Python documentation and then:

  1. create a list of couples (list of tuples) where each tuple is the node identifier and the corresponding indegree

  2. sort the list by using the second value of the tuples as a key.

Show solution
[36]:
# write here


Show solution
[37]:

../_images/visualization_visualization-sol_104_0.png

out degrees per node sorted

✪✪✪ Do the same graph as before for the outdegrees.

You can get the outdegree of a node with:

[38]:
G1.out_degree('b')
[38]:
3
Show solution
[39]:

../_images/visualization_visualization-sol_111_0.png
Show solution
[40]:
# write here


degrees per node

✪✪✪ We might check as well the sorted degrees per node, intended as the sum of in_degree and out_degree. To get the sum, use G1.degree(node) function.

[41]:
# write here the solution


Show solution
[42]:

../_images/visualization_visualization-sol_121_0.png

✪✪✪✪ EXERCISE: Look at this example, and make a double bar chart sorting nodes by their total degree. To do so, in the tuples you will need vertex, in_degree, out_degree and also degree.

Show solution
[43]:
# write here


Show solution
[44]:

../_images/visualization_visualization-sol_130_0.png

Frequency histogram

Now let’s try to draw degree frequencies, that is, for each degree present in the graph we want to display a bar as high as the number of times that particular degree appears.

For doing so, we will need a matplot histogram, see documentation

We will need to tell matplotlib how many columns we want, which in histogram terms are called bins. We also need to give the histogram a series of numbers so it can count how many times each number occurs. Let’s consider this graph G2:

[45]:
import networkx as nx

G2=nx.DiGraph({
    'a':['b','c'],
    'b':['b','c', 'd'],
    'c':['a','b','d'],
    'd':['b', 'd','e'],
    'e':[],
    'f':['c','d','e'],
    'g':['e','g']
})


draw_nx(G2)



../_images/visualization_visualization-sol_133_0.png

If we take the the degree sequence of G2 we get this:

[46]:
degrees_G2 = [G2.degree(n) for n in G2.nodes()]

degrees_G2
[46]:
[7, 3, 7, 3, 3, 3, 6]

We see 3 appears four times, 6 once, and seven twice.

Let’s try to determine a good number for the bins. First we can check the boundaries our x axis should have:

[47]:
min(degrees_G2)
[47]:
3
[48]:
max(degrees_G2)
[48]:
7

So our histogram on the x axis must go at least from 3 and at least to 7. If we want integer columns (bins), we will need at least ticks for going from 3 included to 7 included, so at least ticks for 3,4,5,6,7. For getting precise display, wen we have integer x it is best to also manually provide the sequence of bin edges, remembering it should start at least from the minimum included (in our case, 3) and arrive to the maximum + 1 included (in our case, 7 + 1 = 8)

NOTE: precise histogram drawing can be quite tricky, please do read this StackOverflow post for more details about it.

[49]:

import matplotlib.pyplot as plt
import numpy as np

degrees = [G2.degree(n) for n in G2.nodes()]

# add histogram

# in this case hist returns a tuple of three values
# we put in three variables
n, bins, columns = plt.hist(degrees_G2,
                            bins=range(3,9),  #  3 *included* , 4, 5, 6, 7, 8 *included*
                            width=1.0)        #  graphical width of the bars

plt.xlabel('Degrees')
plt.ylabel('Frequency counts')
plt.title('G2 Degree distribution')
plt.xlim(0, max(degrees) + 2)
plt.show()

../_images/visualization_visualization-sol_140_0.png

As expected we see 3 is counted four times, 6 once, and seven twice.

✪✪✪ EXERCISE: Still, it would be visually better to align the x ticks to the middle of the bars with xticks, and also to make the graph more tight by setting the xlim appropriately. This is not always easy to do.

Read carefully this StackOverflow post and try do it by yourself.

NOTE: set one thing at a time and try if it works(i.e. first xticks and then xlim), doing everything at once might get quite confusing

[50]:
# write here the solution


Show solution
[51]:

../_images/visualization_visualization-sol_147_0.png

Showing plots side by side

You can display plots on a grid. Each cell in the grid is idientified by only one number. For example, for a grid of two rows and three columns, you would have cells indexed like this:

1 2 3
4 5 6
[52]:
%matplotlib inline
import matplotlib.pyplot as plt
import math

xs = [1,2,3,4,5,6]

# cells:
# 1 2 3
# 4 5 6

plt.subplot(2,   # 2 rows
            3,   # 3 columns
            1)   # plotting in first cell
ys1 = [x**3 for x in xs]
plt.plot(xs, ys1)
plt.title('first cell')


plt.subplot(2,   # 2 rows
            3,   # 3 columns
            2)   # plotting in first cell

ys2 = [2*x + 1 for x in xs]
plt.plot(xs,ys2)
plt.title('2nd cell')


plt.subplot(2,   # 2 rows
            3,   # 3 columns
            3)   # plotting in third cell

ys3 = [-2*x + 1 for x in xs]
plt.plot(xs,ys3)
plt.title('3rd cell')


plt.subplot(2,   # 2 rows
            3,   # 3 columns
            4)   # plotting in fourth cell

ys4 = [-2*x**2 for x in xs]
plt.plot(xs,ys4)
plt.title('4th cell')


plt.subplot(2,   # 2 rows
            3,   # 3 columns
            5)   # plotting in fifth cell

ys5 = [math.sin(x) for x in xs]
plt.plot(xs,ys5)
plt.title('5th cell')


plt.subplot(2,   # 2 rows
            3,   # 3 columns
            6)   # plotting in sixth cell

ys6 = [-math.cos(x) for x in xs]
plt.plot(xs,ys6)
plt.title('6th cell')

plt.show()
../_images/visualization_visualization-sol_149_0.png

Graph models

Let’s study frequencies of some known network types.

Erdős–Rényi model

✪✪ A simple graph model we can think of is the so-called Erdős–Rényi model: is is an undirected graph where have n nodes, and each node is connected to each other with probability p. In networkx, we can generate a random one by issuing this command:

[53]:
G = nx.erdos_renyi_graph(10, 0.5)

In the drawing, by looking the absence of arrows confirms it is undirected:

[54]:
draw_nx(G)
../_images/visualization_visualization-sol_155_0.png

Try plotting degree distribution for different values of p (0.1, 0.5, 0.9) with a fixed n=1000, putting them side by side on the same row. What does their distribution look like ? Where are they centered ?

To avoid rewriting the same code again and again, define a plot_erdos(n,p,j) function to be called three times.

[55]:
# write here the solution


Show solution
[56]:


                                           Erdős–Rényi degree distribution SOLUTION
../_images/visualization_visualization-sol_161_1.png

Other plots

Matplotlib allows to display pretty much any you might like, here we collect some we use in the course, for others, see the extensive Matplotlib documentation

Pie chart

[57]:
%matplotlib inline
import matplotlib.pyplot as plt

labels = ['Oranges', 'Apples', 'Cocumbers']
fracs = [14, 23, 5]   # how much for each sector, note doesn't need to add up to 100

plt.pie(fracs, labels=labels, autopct='%1.1f%%', shadow=True)
plt.title("Super strict vegan diet (good luck)")
plt.show()
../_images/visualization_visualization-sol_164_0.png