I am looking into the chatbot tutorial at:
https://medium.com/predict/creating-a-chatbot-from-scratch-using-keras-and-tensorflow-59e8fc76be79
It uses sequence to sequence model with encoder/decoder to solve the problem:
encoder_inputs = tf.keras.layers.Input(shape=( None , ))
encoder_embedding = tf.keras.layers.Embedding( VOCAB_SIZE, 200 , mask_zero=True ) (encoder_inputs)
encoder_outputs , state_h , state_c = tf.keras.layers.LSTM( 200 , return_state=True )( encoder_embedding )
encoder_states = [ state_h , state_c ]
decoder_inputs = tf.keras.layers.Input(shape=( None , ))
decoder_embedding = tf.keras.layers.Embedding( VOCAB_SIZE, 200 , mask_zero=True) (decoder_inputs)
decoder_lstm = tf.keras.layers.LSTM( 200 , return_state=True , return_sequences=True )
decoder_outputs , _ , _ = decoder_lstm ( decoder_embedding , initial_state=encoder_states )
decoder_dense = tf.keras.layers.Dense( VOCAB_SIZE , activation=tf.keras.activations.softmax )
output = decoder_dense ( decoder_outputs )
model = tf.keras.models.Model([encoder_inputs, decoder_inputs], output )
model.compile(optimizer=tf.keras.optimizers.RMSprop(), loss='categorical_crossentropy')
model.summary()
I understand that the "chatbot question" needs to be the input of the encoder, and I also understand that "chatbot answer" needs to be the output of the decoder. However, I do not understand why the "chatbot answer" (decoder_inputs) has to be the decoder (as well as the entire model) input:
model = tf.keras.models.Model([encoder_inputs, decoder_inputs], output )
Could anyone please share their thoughts? Also, any paper related to such approach? What's the intuition behind using the "chatbot answer" as the inputs to decoder? Thanks!