Analytics with Pandas : 2. Advanced operations
Download exercises zip
Let’s see how to do more advanced operations with pandas, like grouping with groupby
, joining tables with merge
, and perform geospatial analysis with GeoPandas (only mentioned).
We chose to collect such topics in this notebook as tipically while executing these operations problems are more likely to arise and thus some further internet search is required.
1. Grouping
Reference:
To group items and perform statistics on each group, you can use the groupby
method.
Let’s see an example of a possible grouping. First we reload again the astropi.csv described in the previous tutorial
[1]:
import pandas as pd
import numpy as np
df = pd.read_csv('astropi.csv', encoding='UTF-8')
Suppose we want to calculate how many readings of humidity
fall into the interval defined by each integer value of humidity humidity_int
, so to be able to plot a bar chart like this (actually there are faster methods with numpy for making histograms but here we follow the step by step approach)
1.1 Let’s see a group
To get an initial idea, we could start checking only the rows that belong to the group 42
, that is have a humidity value lying between 42.0
included until 43.0
excluded. We can use the transform method as previously seen, noting that group 42
holds 2776
rows:
[2]:
df[ df['humidity'].transform(int) == 42]
[2]:
time_stamp | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | mag_y | mag_z | accel_x | accel_y | accel_z | gyro_x | gyro_y | gyro_z | reset | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
19222 | 2016-02-18 16:37:00 | 33.18 | 28.96 | 26.51 | 42.99 | 1006.10 | 1.19 | 53.23 | 313.69 | 9.081925 | -32.244905 | -35.135448 | -0.000581 | 0.018936 | 0.014607 | 0.000563 | 0.000346 | -0.000113 | 0 |
19619 | 2016-02-18 17:43:50 | 33.34 | 29.06 | 26.62 | 42.91 | 1006.30 | 1.50 | 52.54 | 194.49 | -53.197113 | -4.014863 | -20.257249 | -0.000439 | 0.018838 | 0.014526 | -0.000259 | 0.000323 | -0.000181 | 0 |
19621 | 2016-02-18 17:44:10 | 33.38 | 29.06 | 26.62 | 42.98 | 1006.28 | 1.01 | 52.89 | 195.39 | -52.911983 | -4.207085 | -20.754475 | -0.000579 | 0.018903 | 0.014580 | 0.000415 | -0.000232 | 0.000400 | 0 |
19655 | 2016-02-18 17:49:51 | 33.37 | 29.07 | 26.62 | 42.94 | 1006.28 | 0.93 | 53.21 | 203.76 | -43.124080 | -8.181511 | -29.151436 | -0.000432 | 0.018919 | 0.014608 | 0.000182 | 0.000341 | 0.000015 | 0 |
19672 | 2016-02-18 17:52:40 | 33.33 | 29.06 | 26.62 | 42.93 | 1006.24 | 1.34 | 52.71 | 206.97 | -36.893841 | -10.130503 | -31.484077 | -0.000551 | 0.018945 | 0.014794 | -0.000378 | -0.000013 | -0.000101 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
110864 | 2016-02-29 09:24:21 | 31.56 | 27.52 | 24.83 | 42.94 | 1005.83 | 1.58 | 49.93 | 129.60 | -15.169673 | -27.642610 | 1.563183 | -0.000682 | 0.017743 | 0.014646 | -0.000264 | 0.000206 | 0.000196 | 0 |
110865 | 2016-02-29 09:24:30 | 31.55 | 27.50 | 24.83 | 42.72 | 1005.85 | 1.89 | 49.92 | 130.51 | -15.832622 | -27.729389 | 1.785682 | -0.000736 | 0.017570 | 0.014855 | 0.000143 | 0.000199 | -0.000024 | 0 |
110866 | 2016-02-29 09:24:41 | 31.58 | 27.50 | 24.83 | 42.83 | 1005.85 | 2.09 | 50.00 | 132.04 | -16.646212 | -27.719479 | 1.629533 | -0.000647 | 0.017657 | 0.014799 | 0.000537 | 0.000257 | 0.000057 | 0 |
110867 | 2016-02-29 09:24:50 | 31.62 | 27.50 | 24.83 | 42.81 | 1005.88 | 2.88 | 49.69 | 133.00 | -17.270447 | -27.793136 | 1.703806 | -0.000835 | 0.017635 | 0.014877 | 0.000534 | 0.000456 | 0.000195 | 0 |
110868 | 2016-02-29 09:25:00 | 31.57 | 27.51 | 24.83 | 42.94 | 1005.86 | 2.17 | 49.77 | 134.18 | -17.885872 | -27.824149 | 1.293345 | -0.000787 | 0.017261 | 0.014380 | 0.000459 | 0.000076 | 0.000030 | 0 |
2776 rows × 19 columns
1.2 groupby
We can generlize and associate to each integer group the amount of rows belonging to that group with the groupby
method. First let’s make a column holding the integer humidity value for each group:
[3]:
df['humidity_int'] = df['humidity'].transform( lambda x: int(x) )
[4]:
df[ ['time_stamp', 'humidity_int', 'humidity'] ].head()
[4]:
time_stamp | humidity_int | humidity | |
---|---|---|---|
0 | 2016-02-16 10:44:40 | 44 | 44.94 |
1 | 2016-02-16 10:44:50 | 45 | 45.12 |
2 | 2016-02-16 10:45:00 | 45 | 45.12 |
3 | 2016-02-16 10:45:10 | 45 | 45.32 |
4 | 2016-02-16 10:45:20 | 45 | 45.18 |
Then we can call groupby
by writing down:
first the column where to group (
humidity_int
)second the column where to calculate the statistics
finally the statistics to be performed, in this case
.count()
(other common ones aresum()
,min()
,max()
,mean()
…)
[5]:
df.groupby(['humidity_int'])['humidity'].count()
[5]:
humidity_int
42 2776
43 2479
44 13029
45 32730
46 35775
47 14176
48 7392
49 297
50 155
51 205
52 209
53 128
54 224
55 164
56 139
57 183
58 237
59 271
60 300
Name: humidity, dtype: int64
Note the result is a Series
:
[6]:
result = df.groupby(['humidity_int'])['humidity'].count()
[7]:
type(result)
[7]:
pandas.core.series.Series
Since we would like a customized bar chart, for the sake of simplicity we could use the native plt.plot
function of matplotlib, for which we will need one sequence for \(xs\) coordinates and another one for the \(ys\).
The sequence for \(xs\) can be extracted from the index of the Series
:
[8]:
result.index
[8]:
Int64Index([42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
59, 60],
dtype='int64', name='humidity_int')
For the \(ys\) sequence we can directly use the Series
like this:
[9]:
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
plt.bar(result.index, result)
plt.xlabel('humidity groups')
plt.ylabel('count')
plt.title('Humidity distribution')
plt.xticks(result.index, result.index) # shows labels as integers
plt.tick_params(bottom=False) # removes the little bottom lines
plt.show()

1.3 Modifying a dataframe by plugging in the result of a grouping
Notice we’ve got only 19 rows in the grouped series:
[10]:
df.groupby(['humidity_int'])['humidity'].count()
[10]:
humidity_int
42 2776
43 2479
44 13029
45 32730
46 35775
47 14176
48 7392
49 297
50 155
51 205
52 209
53 128
54 224
55 164
56 139
57 183
58 237
59 271
60 300
Name: humidity, dtype: int64
How could we fill the whole original table, assigning to each row the count of its own group?
We can use transform
like this:
[11]:
df.groupby(['humidity_int'])['humidity'].transform('count')
[11]:
0 13029
1 32730
2 32730
3 32730
4 32730
...
110864 2776
110865 2776
110866 2776
110867 2776
110868 2776
Name: humidity, Length: 110869, dtype: int64
As usual, group_by
does not modify the dataframe, if we want the result stored in the dataframe we need to assign the result to a new column:
[12]:
df['humidity_counts'] = df.groupby(['humidity_int'])['humidity'].transform('count')
[13]:
df
[13]:
time_stamp | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | ... | mag_z | accel_x | accel_y | accel_z | gyro_x | gyro_y | gyro_z | reset | humidity_int | humidity_counts | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-02-16 10:44:40 | 31.88 | 27.57 | 25.01 | 44.94 | 1001.68 | 1.49 | 52.25 | 185.21 | -46.422753 | ... | -12.129346 | -0.000468 | 0.019439 | 0.014569 | 0.000942 | 0.000492 | -0.000750 | 20 | 44 | 13029 |
1 | 2016-02-16 10:44:50 | 31.79 | 27.53 | 25.01 | 45.12 | 1001.72 | 1.03 | 53.73 | 186.72 | -48.778951 | ... | -12.943096 | -0.000614 | 0.019436 | 0.014577 | 0.000218 | -0.000005 | -0.000235 | 0 | 45 | 32730 |
2 | 2016-02-16 10:45:00 | 31.66 | 27.53 | 25.01 | 45.12 | 1001.72 | 1.24 | 53.57 | 186.21 | -49.161878 | ... | -12.642772 | -0.000569 | 0.019359 | 0.014357 | 0.000395 | 0.000600 | -0.000003 | 0 | 45 | 32730 |
3 | 2016-02-16 10:45:10 | 31.69 | 27.52 | 25.01 | 45.32 | 1001.69 | 1.57 | 53.63 | 186.03 | -49.341941 | ... | -12.615509 | -0.000575 | 0.019383 | 0.014409 | 0.000308 | 0.000577 | -0.000102 | 0 | 45 | 32730 |
4 | 2016-02-16 10:45:20 | 31.66 | 27.54 | 25.01 | 45.18 | 1001.71 | 0.85 | 53.66 | 186.46 | -50.056683 | ... | -12.678341 | -0.000548 | 0.019378 | 0.014380 | 0.000321 | 0.000691 | 0.000272 | 0 | 45 | 32730 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
110864 | 2016-02-29 09:24:21 | 31.56 | 27.52 | 24.83 | 42.94 | 1005.83 | 1.58 | 49.93 | 129.60 | -15.169673 | ... | 1.563183 | -0.000682 | 0.017743 | 0.014646 | -0.000264 | 0.000206 | 0.000196 | 0 | 42 | 2776 |
110865 | 2016-02-29 09:24:30 | 31.55 | 27.50 | 24.83 | 42.72 | 1005.85 | 1.89 | 49.92 | 130.51 | -15.832622 | ... | 1.785682 | -0.000736 | 0.017570 | 0.014855 | 0.000143 | 0.000199 | -0.000024 | 0 | 42 | 2776 |
110866 | 2016-02-29 09:24:41 | 31.58 | 27.50 | 24.83 | 42.83 | 1005.85 | 2.09 | 50.00 | 132.04 | -16.646212 | ... | 1.629533 | -0.000647 | 0.017657 | 0.014799 | 0.000537 | 0.000257 | 0.000057 | 0 | 42 | 2776 |
110867 | 2016-02-29 09:24:50 | 31.62 | 27.50 | 24.83 | 42.81 | 1005.88 | 2.88 | 49.69 | 133.00 | -17.270447 | ... | 1.703806 | -0.000835 | 0.017635 | 0.014877 | 0.000534 | 0.000456 | 0.000195 | 0 | 42 | 2776 |
110868 | 2016-02-29 09:25:00 | 31.57 | 27.51 | 24.83 | 42.94 | 1005.86 | 2.17 | 49.77 | 134.18 | -17.885872 | ... | 1.293345 | -0.000787 | 0.017261 | 0.014380 | 0.000459 | 0.000076 | 0.000030 | 0 | 42 | 2776 |
110869 rows × 21 columns
1.4 Exercise - meteo pressure intervals
✪✪✪ The dataset meteo.csv contains the weather data of Trento, November 2017 (source: www.meteotrentino.it). We would like to subdivide the pressure readings into three intervals A (low)
, B (medium)
, C (high)
, and count how many readings have been made for each interval.
IMPORTANT: assign the dataframe to a variable called meteo
so to avoid confusion with other dataframes
1.4.1 Where are the intervals?
First, let’s find the pressure values for these 3 intervals and plot them as segments, so to end up with a chart like this:
Before doing the plot, we will need to know at which height we should plot the segments.
Load the dataset with pandas, calculate the following variables and PRINT them
use
UTF-8
as encodinground values with
round
functionthe excursion is the difference between minimum and maximum
note
intervalC
coincides with the maximum
DO NOT use min
and max
as variable names (they are reserved functions!!)
[14]:
import pandas as pd
# write here
minimum: 966.3
maximum: 998.3
excursion: 32.0
intervalA: 976.97
intervalB: 987.63
intervalC: 998.3
1.4.2 Segments plot
Try now to plot the chart of pressure and the 4 horizontal segments.
to overlay the segments with different colors, just make repeated calls to
plt.plot
a segment is defined by two points: so just find the coordinates of those two points..
try leaving some space above and below the chart
REMEMBER title and labels
Show solution[15]:
# write here

1.4.3 Assigning the intervals
We literally made a picture of where the intervals are located - let’s now ask ourselves how many readings have been done for each interval.
First, try creating a column which assigns to each reading the interval where it belongs to.
HINT 1: use
transform
HINT 2: in the function you are going to define, do not recalculate inside values such as minimum, maximum, intervals etc because it would slow down Pandas. Instead, use the variables we’ve already defined - remember that
transform
riexecutes the argument function for each row!
[16]:
# write here
[16]:
Date | Pressure | Rain | Temp | PressureInterval | |
---|---|---|---|---|---|
0 | 01/11/2017 00:00 | 995.4 | 0.0 | 5.4 | C (high) |
1 | 01/11/2017 00:15 | 995.5 | 0.0 | 6.0 | C (high) |
2 | 01/11/2017 00:30 | 995.5 | 0.0 | 5.9 | C (high) |
3 | 01/11/2017 00:45 | 995.7 | 0.0 | 5.4 | C (high) |
4 | 01/11/2017 01:00 | 995.7 | 0.0 | 5.3 | C (high) |
... | ... | ... | ... | ... | ... |
2873 | 30/11/2017 23:00 | 980.0 | 0.0 | 0.2 | B (medium) |
2874 | 30/11/2017 23:15 | 980.2 | 0.0 | 0.5 | B (medium) |
2875 | 30/11/2017 23:30 | 980.2 | 0.0 | 0.6 | B (medium) |
2876 | 30/11/2017 23:45 | 980.5 | 0.0 | 0.2 | B (medium) |
2877 | 01/12/2017 00:00 | 980.6 | 0.0 | -0.3 | B (medium) |
2878 rows × 5 columns
1.4.4 Grouping by intervals
We would like to have an histogram like this one:
First, create a grouping to count occurrences:
[17]:
# write here
[17]:
PressureInterval
A (low) 255
B (medium) 1243
C (high) 1380
Name: Pressure, dtype: int64
Now plot it
NOTE: the result of
groupby
is also aSeries
, so it’s plottable as we’ve already seen…REMEMBER title and axis labels
[18]:
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
# write here

1.5 Exercise - meteo average temperature
✪✪✪ Calculate the average temperature for each day, and show it in the plot, so to have a couple new columns like these:
Day Avg_day_temp
01/11/2017 7.983333
01/11/2017 7.983333
01/11/2017 7.983333
. .
. .
02/11/2017 7.384375
02/11/2017 7.384375
02/11/2017 7.384375
. .
. .
HINT 1: add 'Day'
column by extracting only the day from the date. To do it, use the function .str
applied to all the column.
HINT 2: There are various ways to solve the exercise:
Most perfomant and elegant is with
groupby
operator, see Pandas trasform - more than meets the eyeAs alternative, you may use a
for
to cycle through days. Typically, using afor
is not a good idea with Pandas, as on large datasets it can take a lot to perform the updates. Still, since this dataset is small enough, you should get results in a decent amount of time.
[19]:
# write here
[20]:
****SOLUTION 1 (EFFICIENT) - best solution with groupby and transform
Date Pressure Rain Temp Day Avg_day_temp
0 01/11/2017 00:00 995.4 0.0 5.4 01/11/2017 7.983333
1 01/11/2017 00:15 995.5 0.0 6.0 01/11/2017 7.983333
2 01/11/2017 00:30 995.5 0.0 5.9 01/11/2017 7.983333
3 01/11/2017 00:45 995.7 0.0 5.4 01/11/2017 7.983333
4 01/11/2017 01:00 995.7 0.0 5.3 01/11/2017 7.983333

[21]:
[22]:
2. Merging tables
Suppose we want to add a column with geographical position of the ISS. To do so, we would need to join our dataset with another one containing such information. Let’s take for example the dataset iss-coords.csv
[23]:
iss_coords = pd.read_csv('iss-coords.csv', encoding='UTF-8')
[24]:
iss_coords
[24]:
timestamp | lat | lon | |
---|---|---|---|
0 | 2016-01-01 05:11:30 | -45.103458 | 14.083858 |
1 | 2016-01-01 06:49:59 | -37.597242 | 28.931170 |
2 | 2016-01-01 11:52:30 | 17.126141 | 77.535602 |
3 | 2016-01-01 11:52:30 | 17.126464 | 77.535861 |
4 | 2016-01-01 14:54:08 | 7.259561 | 70.001561 |
... | ... | ... | ... |
333 | 2016-02-29 13:23:17 | -51.077590 | -31.093987 |
334 | 2016-02-29 13:44:13 | 30.688553 | -135.403820 |
335 | 2016-02-29 13:44:13 | 30.688295 | -135.403533 |
336 | 2016-02-29 18:44:57 | 27.608774 | -130.198781 |
337 | 2016-02-29 21:36:47 | 27.325186 | -129.893278 |
338 rows × 3 columns
We notice there is a timestamp
column, which unfortunately has a slightly different name that time_stamp
column (notice the underscore _
) in original astropi dataset:
[25]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 110869 entries, 0 to 110868
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time_stamp 110869 non-null object
1 temp_cpu 110869 non-null float64
2 temp_h 110869 non-null float64
3 temp_p 110869 non-null float64
4 humidity 110869 non-null float64
5 pressure 110869 non-null float64
6 pitch 110869 non-null float64
7 roll 110869 non-null float64
8 yaw 110869 non-null float64
9 mag_x 110869 non-null float64
10 mag_y 110869 non-null float64
11 mag_z 110869 non-null float64
12 accel_x 110869 non-null float64
13 accel_y 110869 non-null float64
14 accel_z 110869 non-null float64
15 gyro_x 110869 non-null float64
16 gyro_y 110869 non-null float64
17 gyro_z 110869 non-null float64
18 reset 110869 non-null int64
19 humidity_int 110869 non-null int64
20 humidity_counts 110869 non-null int64
dtypes: float64(17), int64(3), object(1)
memory usage: 17.8+ MB
To merge datasets according to the columns, we can use the command merge
like this:
[26]:
# remember merge produces a NEW dataframe
geo_astropi = df.merge(iss_coords, left_on='time_stamp', right_on='timestamp')
# merge will add both time_stamp and timestamp columns,
# so we remove the duplicate column `timestamp`
geo_astropi = geo_astropi.drop('timestamp', axis=1)
[27]:
geo_astropi
[27]:
time_stamp | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | ... | accel_y | accel_z | gyro_x | gyro_y | gyro_z | reset | humidity_int | humidity_counts | lat | lon | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-02-19 03:49:00 | 32.53 | 28.37 | 25.89 | 45.31 | 1006.04 | 1.31 | 51.63 | 34.91 | 21.125001 | ... | 0.018851 | 0.014607 | 0.000060 | -0.000304 | 0.000046 | 0 | 45 | 32730 | 31.434741 | 52.917464 |
1 | 2016-02-19 14:30:40 | 32.30 | 28.12 | 25.62 | 45.57 | 1007.42 | 1.49 | 52.29 | 333.49 | 16.083471 | ... | 0.018687 | 0.014502 | 0.000208 | -0.000499 | 0.000034 | 0 | 45 | 32730 | -46.620658 | -57.311657 |
2 | 2016-02-19 14:30:40 | 32.30 | 28.12 | 25.62 | 45.57 | 1007.42 | 1.49 | 52.29 | 333.49 | 16.083471 | ... | 0.018687 | 0.014502 | 0.000208 | -0.000499 | 0.000034 | 0 | 45 | 32730 | -46.620477 | -57.311138 |
3 | 2016-02-21 22:14:11 | 32.21 | 28.05 | 25.50 | 47.36 | 1012.41 | 0.67 | 52.40 | 27.57 | 15.441683 | ... | 0.018800 | 0.014136 | -0.000015 | -0.000159 | 0.000221 | 0 | 47 | 14176 | 19.138359 | -140.211489 |
4 | 2016-02-23 23:40:50 | 32.32 | 28.18 | 25.61 | 47.45 | 1010.62 | 1.14 | 51.41 | 33.68 | 11.994554 | ... | 0.018276 | 0.014124 | 0.000368 | 0.000368 | 0.000030 | 0 | 47 | 14176 | 4.713819 | 80.261665 |
5 | 2016-02-24 10:05:51 | 32.39 | 28.26 | 25.70 | 46.83 | 1010.51 | 0.61 | 51.91 | 287.86 | 6.554283 | ... | 0.018352 | 0.014344 | -0.000664 | -0.000518 | 0.000171 | 0 | 46 | 35775 | -46.061583 | 22.246025 |
6 | 2016-02-25 00:23:01 | 32.38 | 28.18 | 25.62 | 46.52 | 1008.28 | 0.90 | 51.77 | 30.80 | 9.947132 | ... | 0.018502 | 0.014366 | 0.000290 | 0.000314 | -0.000375 | 0 | 46 | 35775 | 47.047346 | 137.958918 |
7 | 2016-02-27 01:43:10 | 32.42 | 28.34 | 25.76 | 45.72 | 1006.79 | 0.57 | 49.85 | 10.57 | 7.805606 | ... | 0.017930 | 0.014378 | -0.000026 | -0.000013 | -0.000047 | 0 | 45 | 32730 | -41.049112 | 30.193004 |
8 | 2016-02-27 01:43:10 | 32.42 | 28.34 | 25.76 | 45.72 | 1006.79 | 0.57 | 49.85 | 10.57 | 7.805606 | ... | 0.017930 | 0.014378 | -0.000026 | -0.000013 | -0.000047 | 0 | 45 | 32730 | -8.402991 | -100.981726 |
9 | 2016-02-28 09:48:40 | 32.62 | 28.62 | 26.02 | 45.15 | 1006.06 | 1.12 | 50.44 | 301.74 | 10.348327 | ... | 0.017620 | 0.014725 | -0.000358 | -0.000301 | -0.000061 | 0 | 45 | 32730 | 50.047523 | 175.566751 |
10 rows × 23 columns
Exercise 2.1 - better merge
If you notice, above table does have lat
and lon
columns, but has very few rows. Why ? Try to merge the tables in some meaningful way so to have all the original rows and all cells of lat
and lon
filled.
For other merging stategies, read about attribute
how
in Why And How To Use Merge With Pandas in PythonTo fill missing values don’t use fancy interpolation techniques, just put the station position in that given day or hour
[28]:
# write here
[28]:
time_stamp | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | ... | accel_z | gyro_x | gyro_y | gyro_z | reset | humidity_int | humidity_counts | timestamp | lat | lon | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-02-16 10:44:40 | 31.88 | 27.57 | 25.01 | 44.94 | 1001.68 | 1.49 | 52.25 | 185.21 | -46.422753 | ... | 0.014569 | 0.000942 | 0.000492 | -0.000750 | 20 | 44 | 13029 | NaN | NaN | NaN |
1 | 2016-02-16 10:44:50 | 31.79 | 27.53 | 25.01 | 45.12 | 1001.72 | 1.03 | 53.73 | 186.72 | -48.778951 | ... | 0.014577 | 0.000218 | -0.000005 | -0.000235 | 0 | 45 | 32730 | NaN | NaN | NaN |
2 | 2016-02-16 10:45:00 | 31.66 | 27.53 | 25.01 | 45.12 | 1001.72 | 1.24 | 53.57 | 186.21 | -49.161878 | ... | 0.014357 | 0.000395 | 0.000600 | -0.000003 | 0 | 45 | 32730 | NaN | NaN | NaN |
3 | 2016-02-16 10:45:10 | 31.69 | 27.52 | 25.01 | 45.32 | 1001.69 | 1.57 | 53.63 | 186.03 | -49.341941 | ... | 0.014409 | 0.000308 | 0.000577 | -0.000102 | 0 | 45 | 32730 | NaN | NaN | NaN |
4 | 2016-02-16 10:45:20 | 31.66 | 27.54 | 25.01 | 45.18 | 1001.71 | 0.85 | 53.66 | 186.46 | -50.056683 | ... | 0.014380 | 0.000321 | 0.000691 | 0.000272 | 0 | 45 | 32730 | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
110866 | 2016-02-29 09:24:21 | 31.56 | 27.52 | 24.83 | 42.94 | 1005.83 | 1.58 | 49.93 | 129.60 | -15.169673 | ... | 0.014646 | -0.000264 | 0.000206 | 0.000196 | 0 | 42 | 2776 | NaN | NaN | NaN |
110867 | 2016-02-29 09:24:30 | 31.55 | 27.50 | 24.83 | 42.72 | 1005.85 | 1.89 | 49.92 | 130.51 | -15.832622 | ... | 0.014855 | 0.000143 | 0.000199 | -0.000024 | 0 | 42 | 2776 | NaN | NaN | NaN |
110868 | 2016-02-29 09:24:41 | 31.58 | 27.50 | 24.83 | 42.83 | 1005.85 | 2.09 | 50.00 | 132.04 | -16.646212 | ... | 0.014799 | 0.000537 | 0.000257 | 0.000057 | 0 | 42 | 2776 | NaN | NaN | NaN |
110869 | 2016-02-29 09:24:50 | 31.62 | 27.50 | 24.83 | 42.81 | 1005.88 | 2.88 | 49.69 | 133.00 | -17.270447 | ... | 0.014877 | 0.000534 | 0.000456 | 0.000195 | 0 | 42 | 2776 | NaN | NaN | NaN |
110870 | 2016-02-29 09:25:00 | 31.57 | 27.51 | 24.83 | 42.94 | 1005.86 | 2.17 | 49.77 | 134.18 | -17.885872 | ... | 0.014380 | 0.000459 | 0.000076 | 0.000030 | 0 | 42 | 2776 | NaN | NaN | NaN |
110871 rows × 24 columns
3. GeoPandas
You can easily manipulate geographical data with GeoPandas library. For some nice online tutorial, we refer to Geospatial Analysis and Representation for Data Science course website @ master in Data Science University of Trento, by Maurizio Napolitano (FBK)
Continue
Go on with the challenges worksheet