BERT Optimization for Production

Question

I'm using BERT to transform text into 768 dim vector, It's multilingual :

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('paraphrase-multilingual-mpnet-base-v2')

Now i want to put the model into production but the embedding time is too much and i want to reduce and optimize the model to reduce the embedding time What are the libraries that enable me to do this ?

score 1 · Accepted Answer · answered Jul 09 '21 at 09:34

1

you can start by using torchscript, it may require changing ur whole code, and switching to transformers( by loading the backbone of the model and the last layers) so basically u get out from GIL interpreter, coz it does not support multithreading. by with torchscript u can run ur model in c++ env, there's also onnx which I believe it enhances performance.

if ur use case is not a real-time and you are using an API, you can use a queue mechanism like rabbitmq

answered Jul 09 '21 at 09:34

Simone

242
2
9

What is the difference between real time and an api ? "so basically u get out from GIL interpreter, coz it does not support multithreading" can you illustrate more what you mean ? @SADAK – Mohy Mohamed Jul 11 '21 at 13:59

BERT Optimization for Production

1 Answers1