14

multiple softmax in last layer

Is it possible to implement mutiple softmaxes in the last layer in Keras? So the sum of Nodes 1-4 = 1; 5-8 = 1; etc.

Should I go for a different network design?

arthurDent
  • 249
  • 1
  • 2
  • 4

2 Answers2

10

I would use the functional interface.

Something like this:

from keras.layers import Activation, Input, Dense
from keras.models import Model
from keras.layers.merge import Concatenate

input_ = Input(shape=input_shape)

x = input_
x1 = Dense(4, x)
x2 = Dense(4, x)
x3 = Dense(4, x)
x1 = Activation('softmax')(x1)
x2 = Activation('softmax')(x2)
x3 = Activation('softmax')(x3)
x = Concatenate([x1, x2, x3])

model = Model(inputs=input_, outputs=x)
Martin Thoma
  • 18,630
  • 31
  • 92
  • 167
  • 1
    Worth noting the cost function is also going to require similar custom work. – Neil Slater Oct 10 '17 at 20:57
  • @NeilSlater This would be another question. – Martin Thoma Oct 11 '17 at 04:37
  • 3
    I'm not suggesting you add the solution, but I think leaving answer as-is gives the impression that OP's model-building work would be complete. But there's an equal extra amount of work for OP to do in other parts of code if they want to actually train the model. You could at least reference that requirement. Same applies to other answer . . . – Neil Slater Oct 11 '17 at 06:58
  • 2
    @NeilSlater you are absolutely right. I have no idea why I need a different cost function. Can you tell me why this is important? – arthurDent Oct 11 '17 at 10:45
  • 1
    @arthurDent - because Keras' multi-class cross-entropy loss is *probably* not geared up to cope with three simultaneous true classes on each example, and the separation into groups - error in one group may result in gradients incorrectly assigned to outputs in other groups. You could just try it and see what happens . . . it may still converge, but the balance point might not be as good as having three entirely separate networks. – Neil Slater Oct 11 '17 at 11:42
  • 1
    @arthurDent: . . . although I'm thinking it through in more detail, and the gradient at the logit for multiclass cross entropy with softmax, which is simply $\mathbf{\hat{y}} - \mathbf{y}$ may still apply and work successfully. A normal softmax output would of course fail to learn 3 classes simultaneously, but perhaps this answer is all you need after all . . . – Neil Slater Oct 11 '17 at 11:49
  • 1
    use of metrics e.g. `categorical_accuracy` and `predict_classes` methods may need more thought . . . – Neil Slater Oct 11 '17 at 11:51
  • 1
    When trying this, I get `ValueError: Output tensors to a Model must be the output of a Keras 'Layer' (thus holding past layer metadata). Found: `. Ah, it should be `x = Concatenate()([x1, x2, x3])`. – stefanbschneider Aug 15 '19 at 13:35
5

It is possible just implement your own softmax function. You can split a tensor to parts, then compute softmax separately per part and concatenate tensor parts:

def custom_softmax(t):
    sh = K.shape(t)
    partial_sm = []
    for i in range(sh[1] // 4):
        partial_sm.append(K.softmax(t[:, i*4:(i+1)*4]))
    return K.concatenate(partial_sm)

concatenate without axis argument concatenate through last axis (in our case axis=1).

Then you can include this activation function in a hidden layer or add it to a graph.

Dense(activation=custom_activation)

or

model.add(Activation(custom_activation))

You also need to define a new cost function.

Primoz
  • 208
  • 3
  • 8