# Visualization solutions¶

Browse files online

## Introduction¶

We will review the famous library Matplotlib which allows to display a variety of charts, and it is the base of many other visualization libraries.

### What to do¶

• unzip exercises in a folder, you should get something like this:

visualization
visualization.ipynb
visualization-sol.ipynb
jupman.py
soft.py


WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !

• open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook visualization/visualization.ipynb

WARNING 2: DO NOT use the Upload button in Jupyter, instead navigate in Jupyter browser to the unzipped folder !

Shortcut keys:

• to execute Python code inside a Jupyter cell, press Control + Enter

• to execute Python code inside a Jupyter cell AND select next cell, press Shift + Enter

• to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press Alt + Enter

• If the notebooks look stuck, try to select Kernel -> Restart

## First example¶

[2]:

# this is *not* a python command, it is a Jupyter-specific magic command,
# to tell jupyter we want the graphs displayed in the cell outputs
%matplotlib inline

# imports matplotlib
import matplotlib.pyplot as plt

# we can give coordinates as simple numberlists
# this are couples for the function y = 2 * x
xs = [1, 2, 3, 4, 5, 6]
ys = [2, 4, 6, 8,10,12]

plt.plot(xs, ys)

# we can add this after plot call, it doesn't matter
plt.title("my function")
plt.xlabel('x')
plt.ylabel('y')

# prevents showing '<matplotlib.text.Text at 0x7fbcf3c4ff28>' in Jupyter
plt.show()


### Plot style¶

To change the way the line is displayed, you can set dot styles with another string parameter. For example, to display red dots, you would add the string ro, where r stands for red and o stands for dot.

[3]:

%matplotlib inline
import matplotlib.pyplot as plt

xs = [1, 2, 3, 4, 5, 6]
ys = [2, 4, 6, 8,10,12]

plt.plot(xs, ys, 'ro')  # NOW USING RED DOTS

plt.title("my function")
plt.xlabel('x')
plt.ylabel('y')

plt.show()


### x power 2 exercise¶

Try to display the function y = x**2 (x power 2) using green dots and for integer xs going from -10 to 10

[4]:

# write here the solution


Show solution
[5]:




### Axis limits¶

If you want to change the x axis, you can use plt.xlim:

[6]:

%matplotlib inline
import matplotlib.pyplot as plt

xs = [1, 2, 3, 4, 5, 6]
ys = [2, 4, 6, 8,10,12]

plt.plot(xs, ys, 'ro')

plt.title("my function")
plt.xlabel('x')
plt.ylabel('y')

plt.xlim(-5, 10)  # SETS LOWER X DISPLAY TO -5 AND UPPER TO 10
plt.ylim(-7, 26)  # SETS LOWER Y DISPLAY TO -7 AND UPPER TO 26

plt.show()


### Axis size¶

[7]:

%matplotlib inline
import matplotlib.pyplot as plt

xs = [1, 2, 3, 4, 5, 6]
ys = [2, 4, 6, 8,10,12]

fig = plt.figure(figsize=(10,3))  # width: 10 inches, height 3 inches

plt.plot(xs, ys, 'ro')

plt.title("my function")
plt.xlabel('x')
plt.ylabel('y')

plt.show()



### Changing tick labels¶

You can also change labels displayed on ticks on axis with plt.xticks and plt.yticks functions:

Note: instead of xticks you might directly use categorical variables IF you have matplotlib >= 2.1.0

Here we use xticks as sometimes you might need to fiddle with them anyway

[8]:

%matplotlib inline
import matplotlib.pyplot as plt

xs = [1, 2, 3, 4, 5, 6]
ys = [2, 4, 6, 8,10,12]

plt.plot(xs, ys, 'ro')

plt.title("my function")
plt.xlabel('x')
plt.ylabel('y')

# FIRST NEEDS A SEQUENCE WITH THE POSITIONS, THEN A SEQUENCE OF SAME LENGTH WITH LABELS
plt.xticks(xs, ['a', 'b', 'c', 'd', 'e', 'f'])
plt.show()


## Introducting numpy¶

For functions involving reals, vanilla python starts showing its limits and its better to switch to numpy library. Matplotlib can easily handle both vanilla python sequences like lists and numpy array. Let’s see an example without numpy and one with it.

### Example without numpy¶

If we only use vanilla Python (that is, Python without extra libraries like numpy), to display the function y = 2x + 1 we can come up with a solution like this

[9]:


%matplotlib inline
import matplotlib.pyplot as plt

xs = [x*0.1 for x in range(10)]   # notice we can't do a range with float increments
# (and it would also introduce rounding errors)
ys = [(x * 2) + 1 for x in xs]

plt.plot(xs, ys, 'bo')

plt.title("y = 2x + 1  with vanilla python")
plt.xlabel('x')
plt.ylabel('y')

plt.show()


### Example with numpy¶

With numpy, we have at our disposal several new methods for dealing with arrays.

First we can generate an interval of values with one of these methods.

Sine Python range does not allow float increments, we can use np.arange:

[10]:

import numpy as np

xs = np.arange(0,1.0,0.1)
xs

[10]:

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])


Equivalently, we could use np.linspace:

[11]:

xs = np.linspace(0,0.9,10)

xs

[11]:

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])


Numpy allows us to easily write functions on arrays in a natural manner. For example, to calculate ys we can now do like this:

[12]:

ys = 2*xs + 1

ys

[12]:

array([1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8])


Let’s put everything together:

[13]:

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

xs = np.linspace(0,0.9,10)  # left end: 0 *included*  right end: 0.9  *included*   number of values: 10
ys = 2*xs + 1

plt.plot(xs, ys, 'bo')

plt.title("y = 2x + 1  with numpy")
plt.xlabel('x')
plt.ylabel('y')

plt.show()


### y = sin(x) + 3 exercise¶

✪✪✪ Try to display the function y = sin(x) + 3 for x at pi/4 intervals, starting from 0. Use exactly 8 ticks.

NOTE: 8 is the number of x ticks (telecom people would use the term ‘samples’), NOT the x of the last tick !!

1. try to solve it without using numpy. For pi, use constant math.pi (first you need to import math module)

2. try to solve it with numpy. For pi, use constant np.pi (which is exactly the same as math.pi)

b.1) solve it with np.arange

b.2) solve it with np.linspace

1. For each tick, use the label sequence "0π/4", "1π/4" , "2π/4",  "3π/4" ,  "4π/4", "5π/4",   .... . Obviously writing them by hand is easy, try instead to devise a method that works for any number of ticks. What is changing in the sequence? What is constant? What is the type of the part changes ? What is final type of the labels you want to obtain ?

2. If you are in the mood, try to display them better like 0, π/4 , π/2 π, 3π/4 , π, 5π/4 possibly using Latex (requires some search, this example might be a starting point)

NOTE: Latex often involves the usage of the \ bar, like in \frac{2,3}. If we use it directly, Python will interpret \f as a special character and will not send to the Latex processor the string we meant:

[14]:

'\frac{2,3}'

[14]:

'\x0crac{2,3}'


One solution would be to double the slashes, like this:

[15]:

'\\frac{2,3}'

[15]:

'\\frac{2,3}'


An even better one is to prepend the string with the r character, which allows to write slashes only once:

[16]:

r'\frac{2,3}'

[16]:

'\\frac{2,3}'

[17]:

# write here solution for a) y = sin(x) + 3 with vanilla python


Show solution
[18]:



[19]:

# write here solution b.1)      y = sin(x) + 3 with numpy, arange

Show solution
[20]:



[21]:

# write here solution b.2)      y = sin(x) + 3 with numpy, linspace


Show solution
[22]:



Show solution
[23]:

# write here solution c)        y = sin(x) + 3 with numpy and pi xlabels


Show solution
[24]:




### Showing degrees per node¶

Going back to the indegrees and outdegrees as seen in Graph formats - Simple statistics paragraph, we will try to study the distributions visually.

Let’s take an example networkx DiGraph:

[25]:

import networkx as nx

G1=nx.DiGraph({
'a':['b','c'],
'b':['b','c', 'd'],
'c':['a','b','d'],
'd':['b', 'd']
})

draw_nx(G1)


### indegree per node¶

✪✪ Display a plot for graph G where the xtick labels are the nodes, and the y is the indegree of those nodes.

Note: instead of xticks you might directly use categorical variables IF you have matplotlib >= 2.1.0

Here we use xticks as sometimes you might need to fiddle with them anyway

To get the nodes, you can use the G1.nodes() function:

[26]:

G1.nodes()

[26]:

NodeView(('d', 'a', 'b', 'c'))


It gives back a NodeView which is not a list, but still you can iterate through it with a for in cycle:

[27]:

for n in G1.nodes():
print(n)

d
a
b
c


Also, you can get the indegree of a node with

[28]:

G1.in_degree('b')

[28]:

4

Show solution
[29]:

# write here the solution


Show solution
[30]:




## Bar plots¶

The previous plot with dots doesn’t look so good - we might try to use instead a bar plot. First look at this this example, then proceed with the next exercise

[31]:

import numpy as np
import matplotlib.pyplot as plt

xs = [1,2,3,4]
ys = [7,5,8,2 ]

plt.bar(xs, ys,
0.5,             # the width of the bars
color='green',   # someone suggested the default blue color is depressing, so let's put green
align='center')  # bars are centered on the xtick

plt.show()


### indegree per node bar plot¶

✪✪ Display a bar plot for graph G1 where the xtick labels are the nodes, and the y is the indegree of those nodes.

Show solution
[32]:

# write here


Show solution
[33]:




### indegree per node sorted alphabetically¶

✪✪ Display the same bar plot as before, but now sort nodes alphabetically.

NOTE: you cannot run .sort() method on the result given by G1.nodes(), because nodes in network by default have no inherent order. To use .sort() you need first to convert the result to a list object.

Show solution
[34]:



Show solution
[35]:

# write here



### indegree per node sorted¶

✪✪✪ Display the same bar plot as before, but now sort nodes according to their indegree. This is more challenging, to do it you need to use some sort trick. First read the Python documentation and then:

1. create a list of couples (list of tuples) where each tuple is the node identifier and the corresponding indegree

2. sort the list by using the second value of the tuples as a key.

Show solution
[36]:

# write here


Show solution
[37]:




### out degrees per node sorted¶

✪✪✪ Do the same graph as before for the outdegrees.

You can get the outdegree of a node with:

[38]:

G1.out_degree('b')

[38]:

3

Show solution
[39]:



Show solution
[40]:

# write here



### degrees per node¶

✪✪✪ We might check as well the sorted degrees per node, intended as the sum of in_degree and out_degree. To get the sum, use G1.degree(node) function.

[41]:

# write here the solution


Show solution
[42]:




✪✪✪✪ EXERCISE: Look at this example, and make a double bar chart sorting nodes by their total degree. To do so, in the tuples you will need vertex, in_degree, out_degree and also degree.

Show solution
[43]:

# write here


Show solution
[44]:




## Frequency histogram¶

Now let’s try to draw degree frequencies, that is, for each degree present in the graph we want to display a bar as high as the number of times that particular degree appears.

For doing so, we will need a matplot histogram, see documentation

We will need to tell matplotlib how many columns we want, which in histogram terms are called bins. We also need to give the histogram a series of numbers so it can count how many times each number occurs. Let’s consider this graph G2:

[45]:

import networkx as nx

G2=nx.DiGraph({
'a':['b','c'],
'b':['b','c', 'd'],
'c':['a','b','d'],
'd':['b', 'd','e'],
'e':[],
'f':['c','d','e'],
'g':['e','g']
})

draw_nx(G2)



If we take the the degree sequence of G2 we get this:

[46]:

degrees_G2 = [G2.degree(n) for n in G2.nodes()]

degrees_G2

[46]:

[7, 3, 7, 3, 3, 3, 6]


We see 3 appears four times, 6 once, and seven twice.

Let’s try to determine a good number for the bins. First we can check the boundaries our x axis should have:

[47]:

min(degrees_G2)

[47]:

3

[48]:

max(degrees_G2)

[48]:

7


So our histogram on the x axis must go at least from 3 and at least to 7. If we want integer columns (bins), we will need at least ticks for going from 3 included to 7 included, so at least ticks for 3,4,5,6,7. For getting precise display, wen we have integer x it is best to also manually provide the sequence of bin edges, remembering it should start at least from the minimum included (in our case, 3) and arrive to the maximum + 1 included (in our case, 7 + 1 = 8)

NOTE: precise histogram drawing can be quite tricky, please do read this StackOverflow post for more details about it.

[49]:


import matplotlib.pyplot as plt
import numpy as np

degrees = [G2.degree(n) for n in G2.nodes()]

# in this case hist returns a tuple of three values
# we put in three variables
n, bins, columns = plt.hist(degrees_G2,
bins=range(3,9),  #  3 *included* , 4, 5, 6, 7, 8 *included*
width=1.0)        #  graphical width of the bars

plt.xlabel('Degrees')
plt.ylabel('Frequency counts')
plt.title('G2 Degree distribution')
plt.xlim(0, max(degrees) + 2)
plt.show()



As expected we see 3 is counted four times, 6 once, and seven twice.

✪✪✪ EXERCISE: Still, it would be visually better to align the x ticks to the middle of the bars with xticks, and also to make the graph more tight by setting the xlim appropriately. This is not always easy to do.

Read carefully this StackOverflow post and try do it by yourself.

NOTE: set one thing at a time and try if it works(i.e. first xticks and then xlim), doing everything at once might get quite confusing

[50]:

# write here the solution


Show solution
[51]:




## Showing plots side by side¶

You can display plots on a grid. Each cell in the grid is idientified by only one number. For example, for a grid of two rows and three columns, you would have cells indexed like this:

1 2 3
4 5 6

[52]:

%matplotlib inline
import matplotlib.pyplot as plt
import math

xs = [1,2,3,4,5,6]

# cells:
# 1 2 3
# 4 5 6

plt.subplot(2,   # 2 rows
3,   # 3 columns
1)   # plotting in first cell
ys1 = [x**3 for x in xs]
plt.plot(xs, ys1)
plt.title('first cell')

plt.subplot(2,   # 2 rows
3,   # 3 columns
2)   # plotting in first cell

ys2 = [2*x + 1 for x in xs]
plt.plot(xs,ys2)
plt.title('2nd cell')

plt.subplot(2,   # 2 rows
3,   # 3 columns
3)   # plotting in third cell

ys3 = [-2*x + 1 for x in xs]
plt.plot(xs,ys3)
plt.title('3rd cell')

plt.subplot(2,   # 2 rows
3,   # 3 columns
4)   # plotting in fourth cell

ys4 = [-2*x**2 for x in xs]
plt.plot(xs,ys4)
plt.title('4th cell')

plt.subplot(2,   # 2 rows
3,   # 3 columns
5)   # plotting in fifth cell

ys5 = [math.sin(x) for x in xs]
plt.plot(xs,ys5)
plt.title('5th cell')

plt.subplot(2,   # 2 rows
3,   # 3 columns
6)   # plotting in sixth cell

ys6 = [-math.cos(x) for x in xs]
plt.plot(xs,ys6)
plt.title('6th cell')

plt.show()


### Graph models¶

Let’s study frequencies of some known network types.

#### Erdős–Rényi model¶

✪✪ A simple graph model we can think of is the so-called Erdős–Rényi model: is is an undirected graph where have n nodes, and each node is connected to each other with probability p. In networkx, we can generate a random one by issuing this command:

[53]:

G = nx.erdos_renyi_graph(10, 0.5)


In the drawing, by looking the absence of arrows confirms it is undirected:

[54]:

draw_nx(G)


Try plotting degree distribution for different values of p (0.1, 0.5, 0.9) with a fixed n=1000, putting them side by side on the same row. What does their distribution look like ? Where are they centered ?

To avoid rewriting the same code again and again, define a plot_erdos(n,p,j) function to be called three times.

[55]:

# write here the solution


Show solution
[56]:




Erdős–Rényi degree distribution SOLUTION


## Other plots¶

Matplotlib allows to display pretty much any you might like, here we collect some we use in the course, for others, see the extensive Matplotlib documentation

### Pie chart¶

[57]:

%matplotlib inline
import matplotlib.pyplot as plt

labels = ['Oranges', 'Apples', 'Cocumbers']
fracs = [14, 23, 5]   # how much for each sector, note doesn't need to add up to 100

plt.title("Super strict vegan diet (good luck)")
plt.show()