Using GRU with FeedForward layers in Python

Question

I'm trying to reproduce the codes in this paper here for the multi-labeling problem (11 classes), which is using

1- Embedding layer 
2- GRU 
3- two Feed forward Layers with the ReLU activation function 
4- sigmoid unit.

I've tried to run the codes, but it is showing the following error:

ValueError: Error when checking target: expected dense_5 to have 3 dimensions, but got array with shape (6838, 11)

Edit: The error is fixed. I changed the "return_sequences" to False, and removed flatten() to fix the error.

My code: i'm not sure if 2 Feedforward layers are correct. in the paper it stated FF1:1024 units, and FF2: 512 units. with mini-batch size of 32. How can I state it in the code?

target_input=Input(shape=(max_length, ))

target_embedding=Embedding(input_dim=vocabulary_size, output_dim=embedding_dims, #embedding_matrix]
                           input_length=max_length, weights=[embedding_matrix] , trainable=False)(target_input) 

#target_embedding=Dropout(0.3)(target_embedding)

target_gru1=Bidirectional(GRU(units=200, return_sequences=True, dropout=0.3, recurrent_dropout=0.3))(target_embedding)
target_gru=Bidirectional(GRU(units=200, return_sequences=False, dropout=0.3, recurrent_dropout=0.3))(target_gru1)



# target_gru=Dropout(0.3)(target_gru)

#2 feedforward layers
# target_output1=Activation("relu")(target_gru)
# target_output2=Activation("relu")(target_output1)

FF1 = Dense(1024)(target_gru)
target_output1=Activation("relu")(FF1)
FF2 = Dense(512)(target_output1)


target_output=Dense(units=11, activation="sigmoid")(FF2)#target_output2)
target_model=Model(target_input, target_output)
## configuring model for training:
opt = Adam(lr=0.0001)#lr=0.001,decay=0.5
target_model.compile(optimizer=opt,loss="binary_crossentropy", metrics=["categorical_accuracy"])

and here is the layers

Possibly you're missing to add a Flatten layer (`.add(Flatten())`) before first Dense layer. — Random Nerd, Jan 31 '20 at 09:12
@RandomNerd Yeah, thanks, May I know if two feed forward layers are correctly written? in the paper it stated FF1:1024 units, and FF2: 512 units. with mini-batch size of 32. How can I state it in the code? — Zahra Hnn, Feb 01 '20 at 09:13
FF1 = Dense(1024)(target_output2) ; FF2 = Dense( 512)(FF1) and then finally tgt_output = Dense(11)(FF2)..use relu n dropiuts between FF1 and 2 , IF need be — Vikram Murthy, Feb 01 '20 at 13:58
@VikramMurthy May I know Why you used "target_output2" as input of FF1? I updated the codes in my question. Or you mean something like this: target_output1=Activation("relu")(target_gru) FF1 = Dense(1024)(target_output1) target_output2=Activation("relu")(FF1) FF2 = Dense( 512)(target_output2) — Zahra Hnn, Feb 01 '20 at 14:59
Yeah i saw the code before u updated it ..what u have put in the comments above is what i mean now :) ..hope it helps — Vikram Murthy, Feb 02 '20 at 05:13
This looks like keras code - you might want to add the corresponding tag... — hendrik, Feb 02 '20 at 11:00

Leevo · Accepted Answer · 2020-02-04T08:42:03.227

0

The error is caused by return_states = True. You set it to True only if you feed the output of a recurrent layer to another. The "states" are the hidden states of recurrent cells, that could be fed to Dense() layers only though Flatten().

Moreover, I suggest you to delete the Dropout() layer. Don't put it after an Embedding(), it contains information (i.e. the learned representation of words/chars) that is not safe to be distorted by dropout. This is just a suggestion of course.

EDIT:

You need return_sequences = True only when a recurrent layer outputs to another recurrent layer. If the following layer is Dense(), then you can drop return_sequences = True and also Flatten().

edited Feb 04 '20 at 08:42

answered Feb 03 '20 at 08:18

Leevo

6,005
3
14
51

there are 2 GRUs, that's why I used "return_states = True" (I updated the codes), the error is gone after adding Flatten(), however I wasn't sure about the feedforward layers as I didn't get the close results to the paper – Zahra Hnn Feb 04 '20 at 01:14
You need `return_sequences = True` only when a recurrent layer outputs to another recurrent layer. If the following layer is `Dense()`, then you can drop `return_sequences = True` and also `Flatten()`. – Leevo Feb 04 '20 at 08:41
Thanks I changed the codes. and May I know your opinion about `target_model.compile(optimizer=opt,loss="binary_crossentropy", metrics=["categorical_accuracy"])` , since it is a multi-label problem, should I use "categorical_crossentropy"? – Zahra Hnn Feb 04 '20 at 09:45
One preliminary question, before answering: are your 11 classes **mutually exclusive**? – Leevo Feb 04 '20 at 11:41
@leeno, no, one text might belong to multiple classes. and 11 emotion classes might be associated. – Zahra Hnn Feb 04 '20 at 12:05
I would set `loss = 'categorical_crossentropy'` and `metrics = ['categorical_accuracy']`. I wouldn't use binary crossentropy for any task except binary classification (which is not your case). – Leevo Feb 04 '20 at 12:19
thanks, now there are `loss: 5.2472 - categorical_accuracy: 0.2371 - val_loss: 5.1103 - val_categorical_accuracy: 0.2720` may I know I have to save the weight based on which value?is "loss" , "min" a good option? – Zahra Hnn Feb 04 '20 at 13:49

Using GRU with FeedForward layers in Python

1 Answers1