We want to predict the path of hurricanes with previous records. I decided to use this final exercise to deploy a time series forecast model (which would be my first). This will be treated as a problem of sequence prediction and faced with RNN. For this the I will closely follow the approach of Alemany et al 2019, where the data are mapped into a grid based on geographic coordinates and the prediction consist of knowing the next grid id where the hurricane is going to be based on previous sequences of grid id, the distance traveled and the bearing. My focus will be replicated and try to understand in more detailed manner possible the mentioned approach, of course making some changes under own considerations.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import scipy
import re
from geopy.distance import great_circle as gcd
import math as Math
import time
from mpl_toolkits.basemap import Basemap
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
import keras
import tensorflow as tf
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.layers.recurrent import LSTM
from keras.models import model_from_json
The data will be loaded, for this case almost all original data values will be deleted under the consideration that most of the data recorded represent older phenomena, with no atmospherical variables captured like wind speed or pressure. We keep the hurriace ID, the dat and the grographic coordinates of the center. Name is also droped because it is only usefull for events after 1950.
data = pd.read_csv('atlantic.csv')
Import and clean data
data
ID | Name | Date | Time | Event | Status | Latitude | Longitude | Maximum Wind | Minimum Pressure | ... | Low Wind SW | Low Wind NW | Moderate Wind NE | Moderate Wind SE | Moderate Wind SW | Moderate Wind NW | High Wind NE | High Wind SE | High Wind SW | High Wind NW | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AL011851 | UNNAMED | 18510625 | 0 | HU | 28.0N | 94.8W | 80 | -999 | ... | -999 | -999 | -999 | -999 | -999 | -999 | -999 | -999 | -999 | -999 | |
1 | AL011851 | UNNAMED | 18510625 | 600 | HU | 28.0N | 95.4W | 80 | -999 | ... | -999 | -999 | -999 | -999 | -999 | -999 | -999 | -999 | -999 | -999 | |
2 | AL011851 | UNNAMED | 18510625 | 1200 | HU | 28.0N | 96.0W | 80 | -999 | ... | -999 | -999 | -999 | -999 | -999 | -999 | -999 | -999 | -999 | -999 | |
3 | AL011851 | UNNAMED | 18510625 | 1800 | HU | 28.1N | 96.5W | 80 | -999 | ... | -999 | -999 | -999 | -999 | -999 | -999 | -999 | -999 | -999 | -999 | |
4 | AL011851 | UNNAMED | 18510625 | 2100 | L | HU | 28.2N | 96.8W | 80 | -999 | ... | -999 | -999 | -999 | -999 | -999 | -999 | -999 | -999 | -999 | -999 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
49100 | AL122015 | KATE | 20151112 | 1200 | EX | 41.3N | 50.4W | 55 | 981 | ... | 180 | 120 | 120 | 120 | 60 | 0 | 0 | 0 | 0 | 0 | |
49101 | AL122015 | KATE | 20151112 | 1800 | EX | 41.9N | 49.9W | 55 | 983 | ... | 180 | 120 | 120 | 120 | 60 | 0 | 0 | 0 | 0 | 0 | |
49102 | AL122015 | KATE | 20151113 | 0 | EX | 41.5N | 49.2W | 50 | 985 | ... | 200 | 220 | 120 | 120 | 60 | 0 | 0 | 0 | 0 | 0 | |
49103 | AL122015 | KATE | 20151113 | 600 | EX | 40.8N | 47.5W | 45 | 985 | ... | 180 | 220 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
49104 | AL122015 | KATE | 20151113 | 1200 | EX | 40.7N | 45.4W | 45 | 987 | ... | 150 | 220 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
49105 rows × 22 columns
The data is analyzed to eliminate variables with less than 50% valid data, undesrantung -999 and UNNAMED as missing values. Following this approach, most of the variables are eliminated conserving only the ID, geographic coordinates and maximum wind velocity. To exemplifi the value frecuency of Name and MAximum pressure is shown.
data['Minimum Pressure'].value_counts(normalize=True)*100
-999 62.455962 1005 1.798188 1008 1.722839 1006 1.645454 1009 1.629162 ... 902 0.002036 899 0.002036 889 0.002036 888 0.002036 907 0.002036 Name: Minimum Pressure, Length: 130, dtype: float64
data['Name'].value_counts(normalize=True)*100
UNNAMED 54.094288 FRANCES 0.645555 ARLENE 0.576316 BERTHA 0.545769 DENNIS 0.519295 ... SHARY 0.018328 TAMMY 0.018328 SIXTEEN 0.016292 FERNAND 0.014255 AMELIA 0.012219 Name: Name, Length: 288, dtype: float64
data=data.drop(['Minimum Pressure','Event','Status','Name','Low Wind NE','Low Wind SE','Low Wind SW','Low Wind NW','Moderate Wind NE','Moderate Wind SE','Moderate Wind SW','Moderate Wind NW','High Wind SW','High Wind NE','High Wind SE','High Wind NW'],axis=1)
We manipulate the data so we can acces to records based on an index created with ID key as basis. Then we count the total number of hurricanes, and obtain a total of 1814 different events recorded.
keys = list(enumerate(pd.unique(data['ID'])))
total_hurricane_count = len(pd.unique(data['ID']))
print("There are "+str(total_hurricane_count)+" hurricanes in the database")
y = np.zeros((total_hurricane_count))
for x in range(0,total_hurricane_count):
y[x] = len(pd.DataFrame(data[data['ID'] == keys[x][1]], columns = data.keys()).reset_index(drop = True))
hurricane_amount = pd.DataFrame(y)
There are 1814 hurricanes in the database
The maximum number of observations per hurricane is 133 and the minimum is just 1. Percentil 25 is 14 observations, we use this as threshold to eliminate non appropriate observations due to their short length. The mean value of observations per event is 27 and there are few events with an extreme number of records of over 100, being 133 the total number of observations of longes event recorded.
hurricane_amount.describe()
0 | |
---|---|
count | 1814.000000 |
mean | 27.070011 |
std | 16.983268 |
min | 1.000000 |
25% | 14.000000 |
50% | 23.000000 |
75% | 37.000000 |
max | 133.000000 |
minObs= (hurricane_amount>20)
keysMinObs=[]
for i in minObs.index:
if minObs.loc[i][0]:
keysMinObs.append(keys[i][1])
data = data[data['ID'].isin(keysMinObs)]
keys = list(enumerate(pd.unique(data['ID'])))
After deleting the short length hurricanes, we obtain a histogram with strong negative asymmetry, as expected most of the observations concentrated between 20 and 40. The long tail of the histogram accounts for the longer hurricanes which are not considered as topics at first. A second model will be created, eliminating also observations outside percentil 75.
total_hurricane_count = len(pd.unique(data['ID']))
print(total_hurricane_count)
y = np.zeros((total_hurricane_count))
for x in range(0,total_hurricane_count):
y[x] = len(pd.DataFrame(data[data['ID'] == keys[x][1]], columns = data.keys()).reset_index(drop = True))
hurricane_amount = pd.DataFrame(y)
hurricane_amount.describe()
dist = hurricane_amount.plot.hist(bins=30, edgecolor='white', histtype='barstacked', color='black', legend=None, linewidth=1.2)
plt.show()
1039
Distance traveled from the last observation and bearing or travel direction based on current and previous pount will be estimated from the orginal geographic coordinates. For this we trasform the coordinate format into numeric values according to its cuadrant location and from there calculate espherical distance and bearing.
def Nostr(x):
sp=re.split('(N|W|S|E)',x)
if sp[1] == 'W' or sp[1] == 'S':
num=float(sp[0])*-1
else:
num=float(sp[0])
return num
data['Latitude'] = data['Latitude'].apply(Nostr)
data['Longitude'] = data['Longitude'].apply(Nostr)
total_data_count = len(data)
y = np.zeros(total_hurricane_count)
data['distance'] = np.zeros(total_data_count)
data['bearing'] = np.zeros(total_data_count)
for x in range(0, total_hurricane_count):
t = pd.DataFrame(data[data['ID'] == keys[x][1]], columns = data.keys()).reset_index(drop = False)
dst = 0
prev = (0,0)
for p in zip(t['Latitude'], t['Longitude']):
if prev == (0,0):
prev = p
continue
data.at[t[(t['Latitude'] == p[0]) & (t['Longitude'] == p[1])]['index'].values[0],'distance']=gcd(prev,p).kilometers
dLon = p[1] - prev[1];
temp = float(p[0])
y_x = Math.sin(dLon) * Math.cos(temp)
x_x = Math.cos(p[1]) * Math.sin(temp) - Math.sin(p[1]) * Math.cos(temp) * Math.cos(dLon)
brng = Math.degrees(Math.atan2(y_x, x_x))
if (brng < 0):
brng+= 360
data.at[t[(t['Latitude'] == p[0]) & (t['Longitude'] == p[1])]['index'].values[0], 'bearing']= brng
dst += gcd(prev,p).kilometers
prev = p
y[x] = dst
hurricane_distance = pd.DataFrame(y)
We now have a 1039 table that includes the vectorial behavior of hurricane movement. We can also tell that the behavior of distance traveled variable presents great variability. There are hurricanes that travel only 718 kilometers, the mean traveled distance is 4810 and some hurricanes traveled up to 13902 kilometers.
data
ID | Date | Time | Latitude | Longitude | Maximum Wind | distance | bearing | |
---|---|---|---|---|---|---|---|---|
16 | AL041851 | 18510816 | 0 | 13.4 | -48.0 | 40 | 0.000000 | 0.000000 |
17 | AL041851 | 18510816 | 600 | 13.7 | -49.5 | 40 | 165.545433 | 326.268781 |
18 | AL041851 | 18510816 | 1200 | 14.0 | -51.0 | 50 | 165.342833 | 349.579563 |
19 | AL041851 | 18510816 | 1800 | 14.4 | -52.8 | 50 | 199.067179 | 161.573455 |
20 | AL041851 | 18510817 | 0 | 14.9 | -54.6 | 60 | 201.466132 | 121.516233 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
49080 | AL112015 | 20151014 | 0 | 36.0 | -9.0 | 20 | 67.316208 | 359.140120 |
49081 | AL112015 | 20151014 | 600 | 35.5 | -8.7 | 20 | 61.838635 | 323.341560 |
49082 | AL112015 | 20151014 | 1200 | 35.1 | -8.4 | 20 | 52.148787 | 210.415062 |
49083 | AL112015 | 20151014 | 1800 | 35.0 | -8.0 | 15 | 38.071961 | 204.811894 |
49084 | AL112015 | 20151015 | 0 | 35.2 | -7.7 | 15 | 35.205672 | 195.594607 |
39255 rows × 8 columns
hurricane_distance.describe()
0 | |
---|---|
count | 1039.000000 |
mean | 4810.571326 |
std | 2367.998761 |
min | 718.601374 |
25% | 3047.282880 |
50% | 4348.485268 |
75% | 6149.359920 |
max | 13902.037275 |
In the following table, the ID of the top 6 hurricanes which travel the longest distance is presented along with the distance in kilometers and the number of observations. AS expected, hurricanes with higher number of observations are those which travel the most.
print ('Top 6 Hurricanes with longer paths')
for x in hurricane_distance.nlargest(6, 0).index:
print (keys[x][1], "-", hurricane_distance.loc[x][0], "kilemters -", hurricane_amount.loc[x][0], " ","number of observations")
Top 6 Hurricanes with longer paths AL031899 - 13902.037275075161 kilemters - 133.0 number of observations AL032000 - 13519.340617835665 kilemters - 87.0 number of observations AL061966 - 13439.368335126639 kilemters - 69.0 number of observations AL011900 - 12774.077723429358 kilemters - 81.0 number of observations AL092004 - 12484.604805793771 kilemters - 94.0 number of observations AL122011 - 12481.711146561658 kilemters - 64.0 number of observations
The positive correlation between distance and number observations can be best illustrated in the following scatter plot, and by calculating Pearsons correlation coefficient, which is 77%, confirming a strong positive correlation.
print( np.corrcoef(hurricane_distance[0], hurricane_amount[0]) )
corr = plt.scatter(hurricane_distance[0], hurricane_amount[0], color='black')
[[1. 0.77093697] [0.77093697 1. ]]
For illustration the folloginf figures plot the path of the 5 longest and 5 shortest hurricanes. It is clear that the behavior depending on the size of the path is extremely different which might difficult the task of regression process.
n = 5
plt.figure(figsize=(10,5))
m = Basemap(llcrnrlon=-110.,llcrnrlat=5.,urcrnrlon=10.,urcrnrlat=60.,
rsphere=(6378137.00,6356752.3142),
resolution='l',
projection='merc',
lat_0=40.,lon_0=-20.,lat_ts=20.)
m.drawcoastlines()
m.drawcountries()
m.drawmapboundary(fill_color='#99ffff')
m.fillcontinents(color='#cc9966',lake_color='#99ffff')
m.drawparallels(np.arange(10,90,20),labels=[1,1,0,1])
m.drawmeridians(np.arange(-180,180,30),labels=[1,1,0,1])
colo=['red','blue','green','black','orange']
i=0
for x in hurricane_amount.nlargest(n,0).index:
largest_hurr = data[data['ID'] == keys[x][1]]
lat = largest_hurr['Latitude'].values
long = largest_hurr['Longitude'].values
xpt, ypt = m(long, lat)
m.plot(xpt, ypt, linewidth=2, color=colo[i])
i+=1
plt.show()
n = 5
plt.figure(figsize=(10,5))
m = Basemap(llcrnrlon=-110.,llcrnrlat=5.,urcrnrlon=10.,urcrnrlat=60.,
rsphere=(6378137.00,6356752.3142),
resolution='l',
projection='merc',
lat_0=40.,lon_0=-20.,lat_ts=20.)
m.drawcoastlines()
m.drawcountries()
m.drawmapboundary(fill_color='#99ffff')
m.fillcontinents(color='#cc9966',lake_color='#99ffff')
m.drawparallels(np.arange(10,90,20),labels=[1,1,0,1])
m.drawmeridians(np.arange(-180,180,30),labels=[1,1,0,1])
colo=['red','blue','green','black','orange']
i=0
for x in hurricane_amount.nsmallest(n,0).index:
largest_hurr = data[data['ID'] == keys[x][1]]
lat = largest_hurr['Latitude'].values
long = largest_hurr['Longitude'].values
xpt, ypt = m(long, lat)
m.plot(xpt, ypt, linewidth=2, color=colo[i])
i+=1
plt.show()
The distribution of the final features is presented. The distribution of distance traveled in 6 hours interval is has a positive kurtosis will most of the values close to the mean, and is positively skewed for extreme values. The bearing distribution tells the more common directions where hurricanes travel, the strange thing is that 180 values, which represent south presents a peak, this might be biased for extreme observations.
dist = data[data['distance'] > 0]['distance']
ser = pd.Series(dist)
ser.plot(kind='kde', cmap='gray')
<AxesSubplot:ylabel='Density'>
direc = data[data['bearing'] > 0]
direc = (direc['bearing'])
ser = pd.Series(direc)
ser.plot(kind='kde', cmap='gray')
<AxesSubplot:ylabel='Density'>
plt.figure(figsize=(10,5))
m = Basemap(llcrnrlon=-110.,llcrnrlat=5.,urcrnrlon=10.,urcrnrlat=60.,
rsphere=(6378137.00,6356752.3142),
resolution='l',
projection='merc',
lat_0=40.,lon_0=-20.,lat_ts=20.)
m.drawcoastlines()
m.drawcountries()
m.drawmapboundary(fill_color='#99ffff')
m.fillcontinents(color='#cc9966',lake_color='#99ffff')
m.drawparallels(np.arange(10,90,20),labels=[1,1,0,1])
m.drawmeridians(np.arange(-180,180,30),labels=[1,1,0,1])
lat = data['Latitude'].values
long = data['Longitude'].values
xpt, ypt = m(long, lat)
m.scatter(xpt, ypt, .3, color='red')
plt.show()
Now, the label is created. The label will consist on a grid ID which is calculated based on coordinates. there will be a accuaricy lost as values are transformed into integers in several part of the calculation, this is algo a generalization step whre olny relatively big displacements are being captured.
data['gridID'] = np.zeros(total_data_count)
lat_min = 7.2
long_min = -109.3
lat_interval = round(66 - 7.2)
long_interval = round(13.5 + 109.3)
data['gridID'] = np.floor(data['Latitude'] - 7.200)* long_interval + np.floor(data['Longitude'] + 109.3)
data['gridID'] = round(data['gridID'])
names = data['ID'].unique()
data.drop(['Date', 'Time', 'Latitude', 'Longitude'], axis = 1, inplace = True)
data.head()
ID | Maximum Wind | distance | bearing | gridID | |
---|---|---|---|---|---|
16 | AL041851 | 40 | 0.000000 | 0.000000 | 799.0 |
17 | AL041851 | 40 | 165.545433 | 326.268781 | 797.0 |
18 | AL041851 | 50 | 165.342833 | 349.579563 | 796.0 |
19 | AL041851 | 50 | 199.067179 | 161.573455 | 917.0 |
20 | AL041851 | 60 | 201.466132 | 121.516233 | 915.0 |
names
array(['AL041851', 'AL011852', 'AL041852', ..., 'AL062015', 'AL102015', 'AL112015'], dtype=object)
scalers = {}
gridScalers = {}
for name in names:
scalers[name] = MinMaxScaler(feature_range=(0, 1))
hurricane = data[data['ID'] == name]
hurricane.drop('ID', axis = 1, inplace= True)
hurricane = pd.DataFrame(scalers[name].fit_transform(hurricane), columns=['Maximum Wind','distance','bearing','gridID'])
data.loc[data['ID'] == name, ['Maximum Wind','distance', 'bearing', 'gridID']] = hurricane.values
The following lines create a padded data structure which is then converted into a pad sequence object of the cross, used to manage and predict sequential observations.
data_pad = []
for key in np.unique(data['ID']):
data_pad += [data[ data.loc[:, 'ID'] == key].loc[:, ['Maximum Wind','distance', 'bearing', 'gridID']]]
padded_data = keras.preprocessing.sequence.pad_sequences(data_pad, maxlen=60, dtype='float', padding='post', truncating='pre', value=0.0)
print(len(padded_data[0]))
print(len(padded_data))
padded_data
60 1039
array([[[3.33333333e-01, 0.00000000e+00, 0.00000000e+00, 1.21852153e-03], [3.33333333e-01, 4.45292032e-01, 6.05519716e-02, 8.12347685e-04], [3.33333333e-01, 3.98119661e-01, 5.70225160e-02, 4.06173842e-04], ..., [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 7.93650794e-03], [0.00000000e+00, 2.84649530e-01, 4.42534039e-01, 6.61375661e-03], [1.42857143e-01, 2.05527817e-01, 4.35757391e-01, 6.61375661e-03], ..., [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 6.11081810e-01, 3.67786230e-01, 4.55893254e-02], [0.00000000e+00, 5.81227832e-01, 1.89498076e-01, 4.52186805e-02], ..., [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], ..., [[0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 3.44690517e-01], [0.00000000e+00, 1.00079070e-01, 5.01424950e-01, 3.44690517e-01], [0.00000000e+00, 1.33050103e-01, 5.01270684e-01, 3.45258376e-01], ..., [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[4.00000000e-01, 0.00000000e+00, 0.00000000e+00, 6.91808597e-01], [4.00000000e-01, 5.81923475e-01, 7.56319799e-01, 6.90997567e-01], [4.00000000e-01, 6.00322425e-01, 8.28346716e-01, 6.90186537e-01], ..., [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[1.66666667e-01, 0.00000000e+00, 0.00000000e+00, 4.24092409e-01], [5.00000000e-01, 2.70432545e-01, 5.83997261e-01, 6.27062706e-01], [6.66666667e-01, 3.34688951e-01, 5.73023215e-01, 6.25412541e-01], ..., [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]]])
The padded data is reshaped in an arry wccounting for the 4 features defined.
temp_flat = padded_data.reshape(60*1039,4)
temp_flat
array([[3.33333333e-01, 0.00000000e+00, 0.00000000e+00, 1.21852153e-03], [3.33333333e-01, 4.45292032e-01, 6.05519716e-02, 8.12347685e-04], [3.33333333e-01, 3.98119661e-01, 5.70225160e-02, 4.06173842e-04], ..., [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]])
Now the model architecture is defined. As described by the designers: five total layers were employed. An input layer, three hidden vector state layers , and an output layer. The input layer takes in a data tuple. A data tuple is a sequence of features. The output layer contains an LSTM building unit along with the dropout value, a dense layer, and the final activation layer.
def load_data(stock, seq_len, amount_of_features):
sequence_length = seq_len + 1
result = []
for index in range(len(stock) - sequence_length):
seq = stock[index: index + sequence_length]
result.append(seq)
result = np.array(result)
row = len(result) * 0.9
train = result[:int(row), :]
x_train = train[:, :-1]
y_train = train[:, -1][:,-1]
x_test = result[int(row):, :-1]
y_test = result[int(row):, -1][:,-1]
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], amount_of_features))
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], amount_of_features))
return [x_train, y_train, x_test, y_test]
def build_model(layers):
model = Sequential()
for x in range(0,3):
model.add(LSTM(units=layers[1], input_shape=(None, layers[0]), return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(layers[2], return_sequences=False))
model.add(Dropout(0.1))
model.add(Dense(units=layers[2]))
model.add(Activation("tanh"))
start = time.time()
opt = tf.keras.optimizers.SGD(learning_rate=1)
model.compile(loss="mse", optimizer=opt,metrics=['accuracy'])
print("Compilation Time : ", time.time() - start)
return model
Data is devided into train a test sets
seq_len = 10
feature_count = 4
X_train, y_train, X_test, y_test = load_data(temp_flat[::-1], seq_len, feature_count)
print("X_train", X_train.shape)
print("y_train", y_train.shape)
print("X_test", X_test.shape)
print("y_test", y_test.shape)
X_train (56096, 10, 4) y_train (56096,) X_test (6233, 10, 4) y_test (6233,)
Model is compiled
model = build_model([feature_count, seq_len, 1])
Compilation Time : 0.8611505031585693
And trained. The accuaricy on the validation set does not vary significantly along the training process, although it gets slowly, loss value tends to decrease mainly but it is not quite significative. After proving several combinations of learning rate, activation function and loss function, this was the best result and, for computational capacities no more epoch was carried.
model.fit(X_train, y_train, batch_size=512, epochs=100, validation_split=0.1, verbose=1)
Epoch 1/100 99/99 [==============================] - 41s 321ms/step - loss: 0.1139 - accuracy: 0.3959 - val_loss: 0.1057 - val_accuracy: 0.4262 Epoch 2/100 99/99 [==============================] - 31s 314ms/step - loss: 0.1106 - accuracy: 0.3993 - val_loss: 0.1044 - val_accuracy: 0.4262 Epoch 3/100 99/99 [==============================] - 31s 315ms/step - loss: 0.1175 - accuracy: 0.3905 - val_loss: 0.0834 - val_accuracy: 0.4250 Epoch 4/100 99/99 [==============================] - 31s 316ms/step - loss: 0.0992 - accuracy: 0.3947 - val_loss: 0.0660 - val_accuracy: 0.4271 Epoch 5/100 99/99 [==============================] - 32s 323ms/step - loss: 0.0857 - accuracy: 0.3970 - val_loss: 0.0866 - val_accuracy: 0.4273 Epoch 6/100 99/99 [==============================] - 32s 325ms/step - loss: 0.0693 - accuracy: 0.4003 - val_loss: 0.0429 - val_accuracy: 0.4289 Epoch 7/100 99/99 [==============================] - 31s 317ms/step - loss: 0.0594 - accuracy: 0.4023 - val_loss: 0.0412 - val_accuracy: 0.4299 Epoch 8/100 99/99 [==============================] - 33s 330ms/step - loss: 0.0485 - accuracy: 0.4032 - val_loss: 0.0391 - val_accuracy: 0.4301 Epoch 9/100 99/99 [==============================] - 31s 311ms/step - loss: 0.0379 - accuracy: 0.4039 - val_loss: 0.0412 - val_accuracy: 0.4305 Epoch 10/100 99/99 [==============================] - 31s 312ms/step - loss: 0.0388 - accuracy: 0.4041 - val_loss: 0.0206 - val_accuracy: 0.4307 Epoch 11/100 99/99 [==============================] - 31s 317ms/step - loss: 0.0340 - accuracy: 0.4044 - val_loss: 0.0208 - val_accuracy: 0.4307 Epoch 12/100 99/99 [==============================] - 32s 319ms/step - loss: 0.0332 - accuracy: 0.4049 - val_loss: 0.0242 - val_accuracy: 0.4307 Epoch 13/100 99/99 [==============================] - 31s 308ms/step - loss: 0.0309 - accuracy: 0.4047 - val_loss: 0.0193 - val_accuracy: 0.4307 Epoch 14/100 99/99 [==============================] - 31s 316ms/step - loss: 0.0313 - accuracy: 0.4046 - val_loss: 0.0199 - val_accuracy: 0.4308 Epoch 15/100 99/99 [==============================] - 31s 317ms/step - loss: 0.0318 - accuracy: 0.4046 - val_loss: 0.0234 - val_accuracy: 0.4307 Epoch 16/100 99/99 [==============================] - 32s 327ms/step - loss: 0.0303 - accuracy: 0.4045 - val_loss: 0.0180 - val_accuracy: 0.4307 Epoch 17/100 99/99 [==============================] - 32s 323ms/step - loss: 0.0313 - accuracy: 0.4046 - val_loss: 0.0187 - val_accuracy: 0.4308 Epoch 18/100 99/99 [==============================] - 31s 309ms/step - loss: 0.0301 - accuracy: 0.4049 - val_loss: 0.0196 - val_accuracy: 0.4308 Epoch 19/100 99/99 [==============================] - 32s 320ms/step - loss: 0.0291 - accuracy: 0.4050 - val_loss: 0.0179 - val_accuracy: 0.4307 Epoch 20/100 99/99 [==============================] - 33s 329ms/step - loss: 0.0309 - accuracy: 0.4047 - val_loss: 0.0177 - val_accuracy: 0.4307 Epoch 21/100 99/99 [==============================] - 32s 322ms/step - loss: 0.0300 - accuracy: 0.4047 - val_loss: 0.0179 - val_accuracy: 0.4307 Epoch 22/100 99/99 [==============================] - 31s 315ms/step - loss: 0.0295 - accuracy: 0.4047 - val_loss: 0.0193 - val_accuracy: 0.4310 Epoch 23/100 99/99 [==============================] - 32s 319ms/step - loss: 0.0310 - accuracy: 0.4049 - val_loss: 0.0189 - val_accuracy: 0.4307 Epoch 24/100 99/99 [==============================] - 32s 321ms/step - loss: 0.0324 - accuracy: 0.4047 - val_loss: 0.0230 - val_accuracy: 0.4310 Epoch 25/100 99/99 [==============================] - 32s 319ms/step - loss: 0.0308 - accuracy: 0.4046 - val_loss: 0.0179 - val_accuracy: 0.4307 Epoch 26/100 99/99 [==============================] - 31s 314ms/step - loss: 0.0296 - accuracy: 0.4046 - val_loss: 0.0203 - val_accuracy: 0.4307 Epoch 27/100 99/99 [==============================] - 32s 322ms/step - loss: 0.0298 - accuracy: 0.4048 - val_loss: 0.0176 - val_accuracy: 0.4307 Epoch 28/100 99/99 [==============================] - 32s 327ms/step - loss: 0.0297 - accuracy: 0.4048 - val_loss: 0.0192 - val_accuracy: 0.4307 Epoch 29/100 99/99 [==============================] - 32s 319ms/step - loss: 0.0304 - accuracy: 0.4048 - val_loss: 0.0189 - val_accuracy: 0.4307 Epoch 30/100 99/99 [==============================] - 31s 318ms/step - loss: 0.0304 - accuracy: 0.4049 - val_loss: 0.0186 - val_accuracy: 0.4310 Epoch 31/100 99/99 [==============================] - 31s 318ms/step - loss: 0.0293 - accuracy: 0.4048 - val_loss: 0.0265 - val_accuracy: 0.4307 Epoch 32/100 99/99 [==============================] - 31s 309ms/step - loss: 0.0301 - accuracy: 0.4047 - val_loss: 0.0200 - val_accuracy: 0.4307 Epoch 33/100 99/99 [==============================] - 31s 308ms/step - loss: 0.0310 - accuracy: 0.4047 - val_loss: 0.0176 - val_accuracy: 0.4308 Epoch 34/100 99/99 [==============================] - 31s 313ms/step - loss: 0.0285 - accuracy: 0.4048 - val_loss: 0.0187 - val_accuracy: 0.4310 Epoch 35/100 99/99 [==============================] - 31s 311ms/step - loss: 0.0305 - accuracy: 0.4048 - val_loss: 0.0225 - val_accuracy: 0.4310 Epoch 36/100 99/99 [==============================] - 31s 309ms/step - loss: 0.0289 - accuracy: 0.4048 - val_loss: 0.0177 - val_accuracy: 0.4308 Epoch 37/100 99/99 [==============================] - 31s 314ms/step - loss: 0.0289 - accuracy: 0.4049 - val_loss: 0.0171 - val_accuracy: 0.4307 Epoch 38/100 99/99 [==============================] - 31s 311ms/step - loss: 0.0307 - accuracy: 0.4046 - val_loss: 0.0198 - val_accuracy: 0.4307 Epoch 39/100 99/99 [==============================] - 32s 328ms/step - loss: 0.0314 - accuracy: 0.4048 - val_loss: 0.0177 - val_accuracy: 0.4307 Epoch 40/100 99/99 [==============================] - 31s 314ms/step - loss: 0.0285 - accuracy: 0.4047 - val_loss: 0.0182 - val_accuracy: 0.4307 Epoch 41/100 99/99 [==============================] - 31s 314ms/step - loss: 0.0303 - accuracy: 0.4045 - val_loss: 0.0176 - val_accuracy: 0.4308 Epoch 42/100 99/99 [==============================] - 32s 319ms/step - loss: 0.0290 - accuracy: 0.4049 - val_loss: 0.0199 - val_accuracy: 0.4307 Epoch 43/100 99/99 [==============================] - 33s 330ms/step - loss: 0.0307 - accuracy: 0.4047 - val_loss: 0.0201 - val_accuracy: 0.4310 Epoch 44/100 99/99 [==============================] - 31s 317ms/step - loss: 0.0294 - accuracy: 0.4047 - val_loss: 0.0210 - val_accuracy: 0.4307 Epoch 45/100 99/99 [==============================] - 33s 337ms/step - loss: 0.0305 - accuracy: 0.4048 - val_loss: 0.0213 - val_accuracy: 0.4307 Epoch 46/100 99/99 [==============================] - 31s 314ms/step - loss: 0.0289 - accuracy: 0.4048 - val_loss: 0.0184 - val_accuracy: 0.4310 Epoch 47/100 99/99 [==============================] - 31s 312ms/step - loss: 0.0295 - accuracy: 0.4047 - val_loss: 0.0186 - val_accuracy: 0.4307 Epoch 48/100 99/99 [==============================] - 31s 310ms/step - loss: 0.0308 - accuracy: 0.4046 - val_loss: 0.0265 - val_accuracy: 0.4310 Epoch 49/100 99/99 [==============================] - 31s 311ms/step - loss: 0.0286 - accuracy: 0.4051 - val_loss: 0.0184 - val_accuracy: 0.4307 Epoch 50/100 99/99 [==============================] - 31s 313ms/step - loss: 0.0294 - accuracy: 0.4048 - val_loss: 0.0175 - val_accuracy: 0.4307 Epoch 51/100 99/99 [==============================] - 31s 316ms/step - loss: 0.0317 - accuracy: 0.4045 - val_loss: 0.0189 - val_accuracy: 0.4307 Epoch 52/100 99/99 [==============================] - 31s 310ms/step - loss: 0.0295 - accuracy: 0.4048 - val_loss: 0.0191 - val_accuracy: 0.4310 Epoch 53/100 99/99 [==============================] - 31s 311ms/step - loss: 0.0299 - accuracy: 0.4049 - val_loss: 0.0172 - val_accuracy: 0.4307 Epoch 54/100 99/99 [==============================] - 31s 317ms/step - loss: 0.0292 - accuracy: 0.4048 - val_loss: 0.0176 - val_accuracy: 0.4307 Epoch 55/100 99/99 [==============================] - 32s 325ms/step - loss: 0.0302 - accuracy: 0.4049 - val_loss: 0.0208 - val_accuracy: 0.4307 Epoch 56/100 99/99 [==============================] - 31s 311ms/step - loss: 0.0285 - accuracy: 0.4048 - val_loss: 0.0179 - val_accuracy: 0.4307 Epoch 57/100 99/99 [==============================] - 30s 308ms/step - loss: 0.0306 - accuracy: 0.4050 - val_loss: 0.0216 - val_accuracy: 0.4307 Epoch 58/100 99/99 [==============================] - 31s 309ms/step - loss: 0.0281 - accuracy: 0.4048 - val_loss: 0.0181 - val_accuracy: 0.4310 Epoch 59/100 99/99 [==============================] - 32s 319ms/step - loss: 0.0299 - accuracy: 0.4050 - val_loss: 0.0222 - val_accuracy: 0.4310 Epoch 60/100 99/99 [==============================] - 31s 312ms/step - loss: 0.0300 - accuracy: 0.4047 - val_loss: 0.0198 - val_accuracy: 0.4310 Epoch 61/100 99/99 [==============================] - 31s 316ms/step - loss: 0.0287 - accuracy: 0.4048 - val_loss: 0.0186 - val_accuracy: 0.4307 Epoch 62/100 99/99 [==============================] - 31s 311ms/step - loss: 0.0300 - accuracy: 0.4050 - val_loss: 0.0259 - val_accuracy: 0.4305 Epoch 63/100 99/99 [==============================] - 31s 311ms/step - loss: 0.0293 - accuracy: 0.4049 - val_loss: 0.0171 - val_accuracy: 0.4308 Epoch 64/100 99/99 [==============================] - 31s 314ms/step - loss: 0.0286 - accuracy: 0.4047 - val_loss: 0.0204 - val_accuracy: 0.4310 Epoch 65/100 99/99 [==============================] - 31s 310ms/step - loss: 0.0282 - accuracy: 0.4048 - val_loss: 0.0183 - val_accuracy: 0.4310 Epoch 66/100 99/99 [==============================] - 31s 312ms/step - loss: 0.0295 - accuracy: 0.4045 - val_loss: 0.0185 - val_accuracy: 0.4310 Epoch 67/100 99/99 [==============================] - 31s 315ms/step - loss: 0.0305 - accuracy: 0.4047 - val_loss: 0.0234 - val_accuracy: 0.4305 Epoch 68/100 99/99 [==============================] - 31s 312ms/step - loss: 0.0303 - accuracy: 0.4048 - val_loss: 0.0170 - val_accuracy: 0.4308 Epoch 69/100 99/99 [==============================] - 31s 310ms/step - loss: 0.0285 - accuracy: 0.4046 - val_loss: 0.0180 - val_accuracy: 0.4307 Epoch 70/100 99/99 [==============================] - 32s 323ms/step - loss: 0.0277 - accuracy: 0.4051 - val_loss: 0.0176 - val_accuracy: 0.4307 Epoch 71/100 99/99 [==============================] - 33s 332ms/step - loss: 0.0299 - accuracy: 0.4048 - val_loss: 0.0264 - val_accuracy: 0.4310 Epoch 72/100 99/99 [==============================] - 32s 318ms/step - loss: 0.0283 - accuracy: 0.4050 - val_loss: 0.0169 - val_accuracy: 0.4308 Epoch 73/100 99/99 [==============================] - 32s 319ms/step - loss: 0.0290 - accuracy: 0.4050 - val_loss: 0.0175 - val_accuracy: 0.4307 Epoch 74/100 99/99 [==============================] - 34s 339ms/step - loss: 0.0301 - accuracy: 0.4046 - val_loss: 0.0185 - val_accuracy: 0.4310 Epoch 75/100 99/99 [==============================] - 31s 316ms/step - loss: 0.0286 - accuracy: 0.4047 - val_loss: 0.0220 - val_accuracy: 0.4310 Epoch 76/100 99/99 [==============================] - 31s 312ms/step - loss: 0.0290 - accuracy: 0.4049 - val_loss: 0.0179 - val_accuracy: 0.4310 Epoch 77/100 99/99 [==============================] - 31s 314ms/step - loss: 0.0289 - accuracy: 0.4049 - val_loss: 0.0200 - val_accuracy: 0.4310 Epoch 78/100 99/99 [==============================] - 31s 309ms/step - loss: 0.0285 - accuracy: 0.4049 - val_loss: 0.0209 - val_accuracy: 0.4310 Epoch 79/100 99/99 [==============================] - 31s 311ms/step - loss: 0.0288 - accuracy: 0.4048 - val_loss: 0.0203 - val_accuracy: 0.4307 Epoch 80/100 99/99 [==============================] - 31s 310ms/step - loss: 0.0302 - accuracy: 0.4047 - val_loss: 0.0175 - val_accuracy: 0.4310 Epoch 81/100 99/99 [==============================] - 31s 314ms/step - loss: 0.0288 - accuracy: 0.4049 - val_loss: 0.0193 - val_accuracy: 0.4310 Epoch 82/100 99/99 [==============================] - 31s 312ms/step - loss: 0.0302 - accuracy: 0.4049 - val_loss: 0.0171 - val_accuracy: 0.4307 Epoch 83/100 99/99 [==============================] - 31s 315ms/step - loss: 0.0269 - accuracy: 0.4048 - val_loss: 0.0173 - val_accuracy: 0.4308 Epoch 84/100 99/99 [==============================] - 31s 315ms/step - loss: 0.0293 - accuracy: 0.4050 - val_loss: 0.0170 - val_accuracy: 0.4308 Epoch 85/100 99/99 [==============================] - 31s 314ms/step - loss: 0.0276 - accuracy: 0.4049 - val_loss: 0.0219 - val_accuracy: 0.4307 Epoch 86/100 99/99 [==============================] - 31s 312ms/step - loss: 0.0288 - accuracy: 0.4048 - val_loss: 0.0219 - val_accuracy: 0.4307 Epoch 87/100 99/99 [==============================] - 31s 316ms/step - loss: 0.0288 - accuracy: 0.4049 - val_loss: 0.0185 - val_accuracy: 0.4307 Epoch 88/100 99/99 [==============================] - 31s 317ms/step - loss: 0.0286 - accuracy: 0.4047 - val_loss: 0.0192 - val_accuracy: 0.4307 Epoch 89/100 99/99 [==============================] - 31s 311ms/step - loss: 0.0282 - accuracy: 0.4048 - val_loss: 0.0177 - val_accuracy: 0.4310 Epoch 90/100 99/99 [==============================] - 31s 312ms/step - loss: 0.0283 - accuracy: 0.4048 - val_loss: 0.0188 - val_accuracy: 0.4310 Epoch 91/100 99/99 [==============================] - 31s 313ms/step - loss: 0.0298 - accuracy: 0.4048 - val_loss: 0.0254 - val_accuracy: 0.4305 Epoch 92/100 99/99 [==============================] - 31s 314ms/step - loss: 0.0288 - accuracy: 0.4050 - val_loss: 0.0175 - val_accuracy: 0.4307 Epoch 93/100 99/99 [==============================] - 31s 314ms/step - loss: 0.0284 - accuracy: 0.4047 - val_loss: 0.0185 - val_accuracy: 0.4307 Epoch 94/100 99/99 [==============================] - 31s 317ms/step - loss: 0.0290 - accuracy: 0.4048 - val_loss: 0.0171 - val_accuracy: 0.4308 Epoch 95/100 99/99 [==============================] - 32s 322ms/step - loss: 0.0286 - accuracy: 0.4049 - val_loss: 0.0181 - val_accuracy: 0.4307 Epoch 96/100 99/99 [==============================] - 32s 322ms/step - loss: 0.0292 - accuracy: 0.4049 - val_loss: 0.0174 - val_accuracy: 0.4308 Epoch 97/100 99/99 [==============================] - 32s 328ms/step - loss: 0.0285 - accuracy: 0.4045 - val_loss: 0.0182 - val_accuracy: 0.4307 Epoch 98/100 99/99 [==============================] - 31s 312ms/step - loss: 0.0287 - accuracy: 0.4047 - val_loss: 0.0199 - val_accuracy: 0.4310 Epoch 99/100 99/99 [==============================] - 31s 313ms/step - loss: 0.0291 - accuracy: 0.4048 - val_loss: 0.0221 - val_accuracy: 0.4307 Epoch 100/100 99/99 [==============================] - 31s 314ms/step - loss: 0.0278 - accuracy: 0.4048 - val_loss: 0.0172 - val_accuracy: 0.4307
<keras.callbacks.History at 0x1bb9efd3b20>
After evaluating the model on the test dataset, the results of MSE and RMSE slightly decrease but mantain the overall reults. Accuaricy achived was aarroung 41 percent for test data. The reported accuariccy in the original publication was of avout 60 percent. the diffence migth be due to the number of samples used, the tresholds for eliminating events, or the features used.
trainScore = model.evaluate(X_train, y_train, verbose=0)
print('Train Score: %.4f MSE (%.4f RMSE)' % (trainScore[0], Math.sqrt(trainScore[0])))
testScore = model.evaluate(X_test, y_test, verbose=0)
print('Test Score: %.4f MSE (%.4f RMSE)' % (testScore[0], Math.sqrt(testScore[0])))
Train Score: 0.0173 MSE (0.1314 RMSE) Test Score: 0.0174 MSE (0.1318 RMSE)
The model is saved and imported to make some predictions over test data.
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
model.save_weights("model.h5")
print("Saved Model to Disk")
Saved Model to Disk
# Load json and create model
json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# Load weights into new model
loaded_model.load_weights("model.h5")
print("Loaded Model from Disk")
model = loaded_model
WARNING:tensorflow:Layer lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. WARNING:tensorflow:Layer lstm_2 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. WARNING:tensorflow:Layer lstm_3 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. Loaded Model from Disk
pred = model.predict(X_test)
In this graph the real and predicted grid is compared. Overall, the fit is appropriate, except in the critical zones where grid values are low or high. As this value are normalized ones, the values near 0 and 1 present the bigger error. By zooming a little (blue line graph) it is clear that small errors also exist in every location but the trend maintains, extreme values of grid ID present the bigger shifts.
plt.figure(figsize=(15, 4), dpi=100)
plt.plot(pred[:1000],
linewidth=1,
color='red',
label='Predicted Grid Locations')
plt.plot(y_test[:1000],
linewidth=1,
color='black',
marker='+',
markersize=4,
label='Real Grid Locations')
plt.xlabel('Data Tuple')
plt.ylabel('Grid Locations')
plt.legend(loc='upper left')
plt.show()
plt.figure(figsize=(15, 4), dpi=100)
plt.plot(pred[:150],
linewidth=1,
color='blue',
label='Predicted Grid Locations')
plt.plot(y_test[:150],
linewidth=1,
color='black',
marker='+',
markersize=4,
label='Real Grid Locations')
plt.xlabel('Data Tuple')
plt.ylabel('Grid Locations')
plt.legend(loc='upper left')
plt.show()
As conclusions:
Alemany, S., Beltran, J., Perez, A., & Ganzfried, S. (2019, July). Predicting hurricane trajectories using a recurrent neural network. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 468-475).