Fix BLEU zero division errors for empty inputs by zanvari · Pull Request #764 · huggingface/evaluate

zanvari · 2026-06-15T20:53:25Z

Summary

This PR fixes the issue reported in #601, where the BLEU metric raises a ZeroDivisionError when either the reference or prediction tokenizes to an empty sequence.

The issue can occur when evaluating inputs such as:

Non-printable characters that result in an empty tokenized reference (e.g. chr(12) / form-feed).
Empty prediction strings.

In these cases, BLEU computation may reach the brevity penalty calculation with either reference_length == 0 or translation_length == 0, leading to a division-by-zero error instead of returning a valid metric output.

Fixes #601.

Reproduction

The following examples previously raised ZeroDivisionError:

from evaluate import load

bleu = load("bleu")

bleu.compute(
    predictions=["hello"],
    references=[[chr(12)]]
)

bleu.compute(
    predictions=[""],
    references=[["test"]]
)

Errors were raised during the computation of the BLEU brevity penalty.

Changes

Added handling for cases where:
- reference_length == 0
- translation_length == 0
Return a valid BLEU result with a score of 0.0 instead of raising an exception.

While implementing this fix, I also vendored nmt_bleu.py into the BLEU metric directory and replaced the external import with a local import. This removes the external GitHub dependency and incidentally resolves the offline-loading issue discussed in #565.

Rationale

When either the reference length or translation length is zero, BLEU is not meaningfully defined. Returning a BLEU score of 0.0 is preferable to raising an exception because:

Evaluation can continue without crashing.
The result clearly indicates no overlap between prediction and reference.
The behavior is consistent with the expectation that degenerate inputs should yield the lowest possible score rather than terminate evaluation.

Validation

I verified the fix using the examples above.

Empty-tokenized reference

bleu.compute(
    predictions=["hello"],
    references=[[chr(12)]]
)

Result:

{
    "bleu": 0.0,
    ...
}

Empty prediction

bleu.compute(
    predictions=[""],
    references=[["test"]]
)

Result:

{
    "bleu": 0.0,
    ...
}

Normal BLEU computation

bleu.compute(
    predictions=["hello there general kenobi"],
    references=[["hello there general kenobi"]]
)

Result:

{
    "bleu": 1.0,
    ...
}

Testing

Reproduced the reported issue from Evaluation of form feed symbol with BLEU results in error #601.
Verified that both failure cases now return valid BLEU outputs.
Verified that standard BLEU behavior remains unchanged for valid inputs.
Ran tests/test_load.py successfully.

If maintainers prefer, I would be happy to split the offline-loading change related to #565 into a separate PR and keep this PR focused solely on the ZeroDivisionError fix.

Fix BLEU zero division errors for empty inputs

4c312af

zanvari mentioned this pull request Jun 15, 2026

Evaluation of form feed symbol with BLEU results in error #601

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix BLEU zero division errors for empty inputs#764

Fix BLEU zero division errors for empty inputs#764
zanvari wants to merge 1 commit into
huggingface:mainfrom
zanvari:fix-bleu-zero-division-empty-input

zanvari commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zanvari commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reproduction

Changes

Rationale

Validation

Empty-tokenized reference

Empty prediction

Normal BLEU computation

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zanvari commented Jun 15, 2026 •

edited

Loading