I am doing a binary classification problem for seizure classification. I split the data into Training, Validation and Test with the following sizes and shapes
dataset_X = (154182, 32, 9, 19), dataset_y = (154182, 1).
The unique values for dataset_y are array([0, 1]), array([77127, 77055])
Then the data is split into to become 92508, 30837 and 30837 for Training, Validation and Testing respectively.
The configuration using Categorical_CrossEntropy with a final dense layer with size of 2 and softmax activation function works very well. However, if I tried to used Binary_CrossEntropy with a final dense layer with size of 1 and sigmoid activation function, the training and validation phase reports almost the same results, but when predicting on test dataset, it is totally messed up.
For the softmax model:
The Model:
def create_cnn_model(X_train_shape, nb_classes):
inputs = Input(shape=X_train_shape[1:])
normal1 = BatchNormalization(axis=-1)(inputs)
reshape1 = Lambda(lambda x: keras.backend.expand_dims(x, axis=-1))(normal1)
conv1 = Convolution3D(
32, (3 ,3, X_train_shape[-1]), data_format = 'channels_last',
padding='valid', strides=(1,1,1))(reshape1)
reshape2 = Lambda(lambda x: keras.backend.squeeze(x, axis=-2))(conv1)
relu1 = Activation('relu')(reshape2)
pool1 = MaxPooling2D(pool_size=(2, 1), data_format = 'channels_last')(relu1)
normal2 = BatchNormalization(axis=-1)(pool1)
conv2 = Convolution2D(
64, (3, 3), data_format = 'channels_last',
padding='valid', strides=(1,1))(normal2)
relu2 = Activation('relu')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 1), data_format = 'channels_last')(relu2)
normal3 = BatchNormalization(axis=-1)(pool2)
conv3 = Convolution2D(
64, (3, 3), data_format = 'channels_last',
padding='valid', strides=(1,1))(normal3)
relu3 = Activation('relu')(conv3)
flat = Flatten()(relu3)
drop1 = Dropout(0.5)(flat)
dens1 = Dense(256, activation='relu')(drop1)
drop2 = Dropout(0.5)(dens1)
dens2 = Dense(nb_classes)(drop2)
last = Activation('softmax')(dens2)
model = Model(inputs=inputs, outputs=last)
return model
The functions that create the model and initiates the training
cnn_model = create_cnn_model(X_train.shape, nb_classes)
adam = Adam(lr=1e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
cnn_model.compile(loss='categorical_crossentropy',
optimizer=adam,
metrics=['accuracy', 'Recall''Precision','AUC'])
Y_train = Y_train.astype('uint8')
Y_train = np_utils.to_categorical(Y_train, nb_classes)
Y_val = np_utils.to_categorical(Y_val, nb_classes)
cnn_model.fit(X_train, Y_train, batch_size=32, epochs=10, validation_data=(X_val,Y_val))
predictions = cnn_model.predict(X_test, verbose=1)
y_pred = np_utils.to_categorical(np.argmax(predictions, axis=1), nb_classes)
y_true = np_utils.to_categorical(Y_test, nb_classes)
#Converting categorical to numerical
y_pred_s = y_pred.argmax(1)
y_true_s = y_true.argmax(1)
print(np.unique(y_pred_s, return_counts=True))
print(np.unique(y_true_s, return_counts=True))
print(y_pred.shape, y_true.shape)
from sklearn.metrics import f1_score, accuracy_score, recall_score, precision_score, roc_auc_score
acc_scr = accuracy_score(y_true, y_pred)
pre_scr = precision_score(y_true, y_pred, average='micro')
rec_scr = recall_score(y_true, y_pred, average='micro')
roc_auc_score = roc_auc_score(y_true, y_pred, average='micro')
f1_test = f1_score(y_true, y_pred, average='weighted')
The training details and testing results after 10 epochs:
Shape: x_train, y_train, X_val, y_val
(92508, 32, 9, 19) (92508, 2) (92508, 32, 9, 19) (30837, 2)
Epoch 1/10
2891/2891 [==============================] - 63s 19ms/step - loss: 0.8718 - accuracy: 0.8860 - recall: 0.8860 - precision: 0.8860 - auc: 0.9474 - val_loss: 0.1635 - val_accuracy: 0.9414 - val_recall: 0.9414 - val_precision: 0.9414 - val_auc: 0.9824
Epoch 2/10
2891/2891 [==============================] - 53s 18ms/step - loss: 0.3728 - accuracy: 0.9361 - recall: 0.9361 - precision: 0.9361 - auc: 0.9813 - val_loss: 0.1891 - val_accuracy: 0.9251 - val_recall: 0.9251 - val_precision: 0.9251 - val_auc: 0.9791
...
Epoch 10/10
2891/2891 [==============================] - 48s 17ms/step - loss: 0.1377 - accuracy: 0.9774 - recall: 0.9774 - precision: 0.9774 - auc: 0.9967 - val_loss: 0.0354 - val_accuracy: 0.9864 - val_recall: 0.9864 - val_precision: 0.9864 - val_auc: 0.9986
964/964 [==============================] - 3s 3ms/step
Shape: X_test, y_test, y_pred
(30837, 32, 9, 19) (30837, 2) (30837, 2)
Accuracy: 0.9854719979245712
Recall: 0.9854719979245712
Precision: 0.9854719979245712
ROC AUC: 0.9854719979245712
For the sigmoid model:
The Model: It is the same model as above but with the following changes:
dens2 = Dense(1)(drop2)
last = Activation('sigmoid')(dens2)
The functions that create the model and initiates the training
cnn_model = create_cnn_model(X_train.shape, nb_classes) #nb_classes is useless here
adam = Adam(lr=1e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
cnn_model.compile(loss='binary_crossentropy',
optimizer=adam,
metrics=['accuracy', 'Recall', 'Precision','AUC'])
cnn_model.fit(X_train, Y_train, batch_size=32, epochs=10, validation_data=(X_val,Y_val))
predictions = cnn_model.predict(X_test, verbose=1)
y_pred = np.argmax(predictions)
y_true = Y_test
print(y_pred.shape, y_true.shape)
from sklearn.metrics import f1_score, accuracy_score, recall_score, precision_score, roc_auc_score
acc_scr = accuracy_score(y_true, y_pred)
pre_scr = precision_score(y_true, y_pred)
rec_scr = recall_score(y_true, y_pred)
roc_auc_score = roc_auc_score(y_true, y_pred)
f1_test = f1_score(y_true, y_pred, average='weighted')
The training details and testing results after 10 epochs:
Shape: x_train, y_train, X_val, y_val
(92508, 32, 9, 19) (92508, 1) (30837, 32, 9, 19) (30837, 1)
Epoch 1/10
2891/2891 [==============================] - ETA: 0s - loss: 0.0284 - accuracy: 0.9920 - recall: 0.2655 - precision: 0.5381 - auc: 0.9277
2891/2891 [==============================] - 80s 24ms/step - loss: 0.0284 - accuracy: 0.9920 - recall: 0.2655 - precision: 0.5381 - auc: 0.9277 - val_loss: 0.0156 - val_accuracy: 0.9955 - val_recall: 0.5370 - val_precision: 0.8734 - val_auc: 0.9432
Epoch 2/10
2891/2891 [==============================] - ETA: 0s - loss: 0.0129 - accuracy: 0.9959 - recall: 0.6269 - precision: 0.8476 - auc: 0.9800
2891/2891 [==============================] - 60s 21ms/step - loss: 0.0129 - accuracy: 0.9959 - recall: 0.6269 - precision: 0.8476 - auc: 0.9800 - val_loss: 0.0079 - val_accuracy: 0.9974 - val_recall: 0.7860 - val_precision: 0.8899 - val_auc: 0.9873
...
Epoch 10/10
2891/2891 [==============================] - 50s 17ms/step - loss: 0.0853 - accuracy: 0.9660 - recall: 0.9665 - precision: 0.9655 - auc: 0.9952 - val_loss: 0.0865 - val_accuracy: 0.9648 - val_recall: 0.9615 - val_precision: 0.9679 - val_auc: 0.9949
964/964 [==============================] - 3s 3ms/step
Shape: X_test, y_test, y_pred
(30837, 32, 9, 19) (30837, 1) (30837,)
Accuracy: 0.5002432143204592
Recall: 0.0
Precision: 0.0
ROC AUC: 0.5
F1-weighted score: 0.33360360651524557
When printing the y_true and y_pred arrays after running the softmax model, after being converted from categorical to numerical, I get:
y_true:(array([0, 1], dtype=int64), array([15426, 15411], dtype=int64))
y_pred: (array([0, 1], dtype=int64), array([15360, 15477], dtype=int64))
However, when I run the same for the sigmoid model, I get:
y_true: (array([0, 1], dtype=uint8), array([15426, 15411], dtype=int64))
y_pred: (array([0], dtype=int64), array([30837], dtype=int64))
it is apparent that not a single '1' label is predicted. This justifies the scores above. So what does cause this behavior and how to fix it?
Thank you