Skip to content

Incorrect alternation chosen when .*.* occurs in pattern #1360

Description

@orlp

What version of regex are you using?

regex = "1.12.3"

Describe the bug at a high level.

If the pattern contains .*.* then regex gets confused and no longer chooses the first-matching alternation.

What are the steps to reproduce the behavior?

let pattern = ".*ab|.*.*";
let haystack = "abc";
dbg!(regex::Regex::new(pattern).unwrap().find_iter(haystack).collect::<Vec<_>>());

What is the actual behavior?

We see that regex returns one match:

[
    Match {
        start: 0,
        end: 3,
        string: "abc",
    },
]

What is the expected behavior?

The correct behavior is found in regex-lite:

[
    Match {
        start: 0,
        end: 2,
        string: "ab",
    },
    Match {
        start: 2,
        end: 3,
        string: "c",
    },
]

The first alteration (.*ab) matches starting at position 0, so it must be chosen.

If the pattern is changed from .*ab|.*.* to .*ab|.* then regex behaves correctly, returning two matches.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions