1

I want to implement a custom Keras loss function that consists of plain binary cross-entropy plus a penalty that increases the loss for false negatives from one class (each observation can belong to one of two classes, privileged and unprivileged) and decreases the loss for true positives from that same class.

My implementation so far can be seen below. Unfortunately, it does not work yet, because as you can see, I simply add the penalty to the binary cross-entropy, and added constants don't enter a derivation, so the penalty does not affect the gradients. Do you have any idea how I can fix this without changing the general idea of the penalty?

priv is an additional tensor encoding to which group an observation belongs.

Any help is appreciated and you might literally save my master's thesis by solving this.

def customLoss2(priv):

  def binary_crossentropy_adjusted_groupspecific(y_true, y_pred): 

    #Binary tensor that is 1 for predictions smaller than tau
    temp = tf.subtract(y_pred, tau)
    temp = K.relu(temp)
    less_than_tau = tf.multiply((tf.subtract(K.sign(temp), 1.0)), -1.0)

    #Inversion of the priviledged tensor
    temp = tf.subtract(priv, 1.0)
    unpriv = tf.multiply(temp, -1.0)

    #Inversion of the true label tensor
    temp = tf.subtract(y_true, 1.0)
    inverted_y_true = tf.multiply(temp, -1.0)

    #Creating tensor with gain for all true negatives
    gains = tf.multiply(tf.multiply(less_than_tau, unpriv), tf.multiply(inverted_y_true, gain))

    #Creating tensor with loss for all false negatives
    losses = tf.multiply(tf.multiply(less_than_tau, unpriv), tf.multiply(y_true, loss))

    #Concatenating the tensors to take their mean
    bce = K.mean(K.binary_crossentropy(y_true, y_pred))
    conc = K.mean(K.concatenate([gains, losses], axis=0))

    sum = bce + conc

    #return result
    return sum

  #return the loss function to keras
  return binary_crossentropy_adjusted_groupspecific
Tim
  • 11
  • 1
  • How do you know that "added constants don't enter a derivation"? Your custom loss uses y_pred which depends on the weights of your network. – Valentas Feb 03 '22 at 09:20
  • Valid point, actually there should not be a problem with the derivation. But if that is not the problem, do you have any idea what might be the problem? Because however large I make the gain and loss parameters, the predictions don't really change. I tried dividing the added penalty by the binary cross-enteopy, after which the parameters did influence the predictions significantly. That's why I thought the penalty might not enter the derivation without this. – Tim Feb 04 '22 at 10:33
  • I didn't try to check you algorithm but it looks more complicated than what you describe. Maybe there is some mistake in it? You can try to experiment by adding simpler custom terms. – Valentas Feb 04 '22 at 13:41

0 Answers0