1

I have trained my own model of fasttext using the pretrained model of English available on their website with the next code:

from gensim.models.fasttext import load_facebook_model

mod = load_facebook_model('fasttext/cc.en.300.bin')
mod.build_vocab(sentences=list(df_train.text), update = True)
mod.train(sentences=list(df_train.tex), total_examples=len(df_train.text), epochs=10)

Now I will like to extract the vectors of this embedding to train a LSTM neural network with it. Any tip on how to do so?

Thanks in advance.

IMB
  • 111
  • 3
  • 1
    Do you want train this embeddings in your LSTM model, or it can be freezed (don't change it at LSTM training)? – Mikhail_Sam Jun 18 '20 at 14:16
  • 1
    It can be freezed. – IMB Jun 18 '20 at 14:18
  • 1
    So then convert all your text train/test datasets into vectors, using fastText embeddings and train your NN on that matrices. At inference do it again - `fasttext_model.get_sentence_vector(sent)` and fed it into NN – Mikhail_Sam Jun 18 '20 at 14:20
  • 1
    Whats the difference between that and creating an embedding layer? For instance like they did in this tutorial: https://www.kaggle.com/vsmolyakov/keras-cnn-with-fasttext-embeddings – IMB Jun 18 '20 at 15:11
  • 1
    fasttext model has a lot of different build-in methods like `get_nearest_neighbors`, etc. Also you can quantize it. If you used pretrained vectors for fastett training you would need to convert it to LSTM.Embedding for hot start to get the same results(I suppose you don't want to train on the Wikipedia :) ) Also I know fasttext use hashing on training (what is why it called FASTtext). I'm not sure about Embedding layer in the keras by default – Mikhail_Sam Jun 18 '20 at 15:24

0 Answers0