1

I am doing a project using T5 Transformer. I have read documentations related to T5 Transformer model. While using T5Tokenizer I am kind of confused with tokenizing my sentences.

Can someone please help me understand the difference between batch_encode_plus() and encode_plus() and when should I use either of the tokenizers.

10sha25
  • 43
  • 2
  • 7

1 Answers1

1

See also the huggingface documentation, but as the name suggests batch_encode_plus tokenizes a batch of (pairs of) sequences whereas encode_plus tokenizes just a single sequence. Looking at the documentation both of these methods are deprecated and you use __call__ instead, which checks by itself if the inputs are batched or not and calls the correct method (see the source code with the is_batched variable and if statement).

Oxbowerce
  • 7,077
  • 2
  • 8
  • 22