1

I'm using BERT to transform text into 768 dim vector, It's multilingual :

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('paraphrase-multilingual-mpnet-base-v2') 

Now i want to put the model into production but the embedding time is too much and i want to reduce and optimize the model to reduce the embedding time What are the libraries that enable me to do this ?

1 Answers1

1

you can start by using torchscript, it may require changing ur whole code, and switching to transformers( by loading the backbone of the model and the last layers) so basically u get out from GIL interpreter, coz it does not support multithreading. by with torchscript u can run ur model in c++ env, there's also onnx which I believe it enhances performance.

if ur use case is not a real-time and you are using an API, you can use a queue mechanism like rabbitmq

Simone
  • 242
  • 2
  • 9
  • What is the difference between real time and an api ? "so basically u get out from GIL interpreter, coz it does not support multithreading" can you illustrate more what you mean ? @SADAK – Mohy Mohamed Jul 11 '21 at 13:59