Skip to content

translate_pattern incorrectly handles \s when preceded by \- inside a character class #100

@daniel55411

Description

@daniel55411

Description

When \s appears after \- inside a character class, translate_pattern fails to translate
\s into whitespace codepoints and leaves it as a literal \\s in the resulting Python pattern.

This causes XSD pattern validation via xmlschema to silently fail for strings containing whitespace.

Steps to Reproduce

from elementpath.regex import translate_pattern

# \s is NOT translated — stays as literal \\s
print(translate_pattern(r"[\-\s',]{1,255}"))
# Output: [',\-\\s]{1,255}  ← \s not expanded

# \s IS translated correctly
print(translate_pattern(r"[\s\-',]{1,255}"))
# Output:
# [
#  ',\-]{1,255}
# \s expanded to whitespace codepoints

Expected Behavior

\s should be translated to the corresponding whitespace codepoints regardless of its position
within the character class.

Actual Behavior

When \- appears before \s in the character class, \s is not translated and remains as \\s
in the output Python pattern.

Workaround

Moving \s to the beginning of the character class fixes the issue:

translate_pattern(r"[\s\-',]{1,255}")

Environment

Name: elementpath
Version: 5.1.1
Python: 3.11
Required-by: xmlschema

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions