Skip to content

Question regarding token_set_ratio #468

Description

@bhargavc-png

I am looking at token_set_ratio computation in fuzzy_py.py

When comparing the differences between two strings

    dist = indel_distance(diff_ab_joined, diff_ba_joined, score_cutoff=cutoff_distance)

    if dist <= cutoff_distance:
        result = _norm_distance(dist, sect_ab_len + sect_ba_len, score_cutoff)

Why is "sect_ab_len+sect_ba_len" used for normalization?
We are comparing diff_ab_joined, diff_ba_joined.
So, shouldn't we be using "ab_len+ba_len" instead of "sect_ab_len+sect_ba_len" ?
By using "sect_ab_len+sect_ba_len", generous scores are given.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions