Implementation of spBLEU

Question

I was looking for a way to explore evaluation metrics for language translation models and I came across spBLEU. I can’t find any implementations/examples that would help me start. Does anyone have a lead on what I can pursue?

thanks in advance!

score 2 · Accepted Answer · answered Aug 24 '23 at 13:43

spBLEU was introduced in the Flores-101 article:

[...] we propose to use BLEU over text tokenized with a single language-agnostic and publicly available fixed SentencePiece subword model. We call this evaluation method spBLEU, for brevity. It has the benefit of continuing to use a metric that the community is familiar with, while addressing the proliferation of tokenizers.

For this, we have trained a SentencePiece (SPM) tokenizer (Kudo and Richardson, 2018) with 256,000 tokens using monolingual data (Conneau et al., 2020; Wenzek et al., 2019) from all the Flores-101 languages. SPM is a system that learns subword units based on training data, and does not require tokenization. The logic is not dependent on language, as the system treats all sentences as sequences of Unicode. Given the large amount of multilingual data and the large number of languages, this essentially provides a universal tokenizer, that can operate on any language.

The Sentencepiece model and some related utilities can be found on the Flores github repo.

To use it, just follow their instructions:

# tokenize with SPM
python scripts/spm_encode.py \
    --model flores_spm_model_here \
    --output_format=piece \
    --inputs={untok_hyp_file} \
    --outputs={hyp_file}

# calculate with sacrebleu
cat {hyp_file} | sacrebleu {ref_file}

Is there an implementation of this in python that I could refer to? As a part of a notebook instead of cmd — Prithvi, Aug 25 '23 at 07:11
The implementation I referred to is in Python, but it is just a thin wrapper of the sentencepiece library. You can just copy the relevant parts of [`spm_encode.py`](https://github.com/facebookresearch/fairseq/blob/main/scripts/spm_encode.py) into your jupyter notebook. [Here](https://github.com/google/sentencepiece/blob/master/python/sentencepiece_python_module_example.ipynb) is a notebook that uses the sentencepiece library directly. — noe, Aug 26 '23 at 11:54
so just to clarify, the spm_encode file expects each test case on a separate line no commas or brackets anywhere in output or references right? — Prithvi, Aug 28 '23 at 07:04
Yes, there should be one test case per line. I think punctuation removal is something not everybody agrees about; you can check [this WMT article](https://aclanthology.org/2022.wmt-1.44/) that studies its effects (among other factors). — noe, Aug 28 '23 at 07:56

Implementation of spBLEU

1 Answers1