Skip to content

RANGER-5647: Fix date-bound tag policy engine test fixtures#1018

Merged
ramackri merged 2 commits into
masterfrom
RANGER-5647-patch
Jun 15, 2026
Merged

RANGER-5647: Fix date-bound tag policy engine test fixtures#1018
ramackri merged 2 commits into
masterfrom
RANGER-5647-patch

Conversation

@ramackri

@ramackri ramackri commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Fixes RANGER-5647.

This PR contains two CI fixes for agents-common:

  1. Date-bound tag test fixturestestPolicyEngine_hiveForTag_filebased fails on/after 2026-06-15
  2. Thread-safe dynamic path matching — intermittent testPolicyEngine_hdfs_resourcespec failure under concurrent policy evaluation

1. Tag fixture dates

Tag policy tests use RESTRICTED-FINAL with a deny exception conditioned on ctx.isAccessedBefore('activation_date'). Fixture dates were 2026/06/15; once the system clock passes that day, CI fails.

Fix: change 2026/06/152099/12/31 in:

  • agents-common/src/test/resources/policyengine/resourceTags.json
  • agents-common/src/test/resources/policyengine/ACLResourceTags.json
  • agents-common/src/test/resources/policyengine/plugin/resourceTags.json
  • agents-common/src/test/resources/policyengine/descendant_tags.json
  • agents-common/src/test/resources/policyengine/test_policyengine_tag_hive.json

2. Concurrent {USER} path matching (hdfs_resourcespec)

Symptom

TestPolicyEngine.testPolicyEngine_hdfs_resourcespec
ALLOW 'read /home/user1/tmp/sales.db' for user=user1
expected: true but was: false

TestPolicyEngine.runTestCaseTests() uses parallelStream() on a shared RangerPolicyEngine. That is intentional — production plugins evaluate concurrently on one engine. The fix belongs in product code, not in serializing tests.

Code path (end-to-end)

Policy under test (test_policyengine_hdfs_resourcespec.json, policy id 2):

"resources": { "path": { "values": ["/home/{USER}/"], "isRecursive": true } },
"policyItems": [ { "users": ["{USER}"], "accesses": [ { "type": "read" } ] } ]

Step 1 — Matcher built once at policy-engine init

RangerPathResourceMatcher.buildResourceMatchers() turns /home/{USER}/ into /home/{USER}/* (recursive + trailing /), then creates one RecursiveWildcardResourceMatcher per policy value. Because {USER} is present, setDelimiters() sets tokenReplacer, so getNeedsDynamicEval() returns true — the path cannot be pre-expanded at init.

// RangerPathResourceMatcher.java — constructor: cache split only for STATIC paths
if (!getNeedsDynamicEval()) {
    wildcardPathElements = StringUtils.split(value, pathSeparatorChar);
}
// For /home/{USER}/* → getNeedsDynamicEval()==true → wildcardPathElements stays null

This matcher object is stored inside the policy evaluator and reused for every request that may hit policy id 2.

Step 2 — Each request gets its own token context

RangerDefaultRequestProcessor.preProcess() sets per-request context:

RangerAccessRequestUtil.setCurrentUserInContext(request.getContext(), request.getUser());
// → context["token:USER"] = "user1" or "user2"

Each RangerAccessRequest has its own context Map — that part is thread-safe.

Step 3 — Evaluation reaches the shared matcher

RangerDefaultPolicyEvaluator.evaluate() calls:

resourceMatcher.getMatchType(request.getResource(), scopes, request.getContext());

which eventually calls RecursiveWildcardResourceMatcher.isMatch(resourceValue, evalContext) where:

  • resourceValue = "/home/user1/tmp/sales.db" (same for both parallel sub-tests)
  • evalContext = that request's context (different token:USER per thread)

Step 4 — Token expansion (per-request, always correct)

ResourceMatcher.getExpandedValue(evalContext) replaces {USER} from context:

if (tokenReplacer != null) {
    ret = tokenReplacer.replaceTokens(ret, evalContext);
    // user1 → "/home/user1/*"
    // user2 → "/home/user2/*"
}

Step 5 — Segment-by-segment matching

isMatch() passes the expanded path and split segments into RangerPathResourceMatcher.isRecursiveWildCardMatch(), which walks path components:

// isRecursiveWildCardMatch() — compares each segment of the request path
// against wildcardPathElements[pathElementIndex]
for (String p : pathElements) {          // request: "", "home", "user1", "tmp", "sales.db"
    String wp = wildcardPathElements[pathElementIndex];  // policy: "", "home", "user1", "*"
    // segment "user1" must equal wp "user1" for user1 to match
    // segment "user1" vs wp "user2" → mismatch → deny
}

The bug was not in token expansion or in per-request context — it was in where the split policy segments were stored between steps 4 and 5.


Before fix — race on shared instance field

RecursiveWildcardResourceMatcher declared split segments as an instance field:

static class RecursiveWildcardResourceMatcher extends AbstractPathResourceMatcher {
    String[] wildcardPathElements;   // ONE field per matcher object, shared by all threads

    boolean isMatch(String resourceValue, Map<String, Object> evalContext) {
        String expandedValue;
        if (getNeedsDynamicEval()) {
            expandedValue        = getExpandedValue(evalContext);   // per-thread: correct
            wildcardPathElements = StringUtils.split(expandedValue, pathSeparatorChar); // BUG: writes shared field
        } else {
            expandedValue = value;
        }
        return function.apply(resourceValue, expandedValue, pathSeparatorChar, ioCase, wildcardPathElements);
        //                                                      reads shared field
    }
}

Two threads evaluating policy id 2 concurrently both enter isMatch() on the same RecursiveWildcardResourceMatcher instance:

Shared object: RecursiveWildcardResourceMatcher for policy "/home/{USER}/*"
               instance field: wildcardPathElements  ← single slot, both threads write here

Thread A (user1)                              Thread B (user2)
─────────────────────────────────────────────────────────────────────────────
isMatch("/home/user1/tmp/sales.db", ctxA)

expandedValue = getExpandedValue(ctxA)
  ctxA["token:USER"] = "user1"
  → expandedValue = "/home/user1/*"           isMatch("/home/user1/tmp/sales.db", ctxB)

                                              expandedValue = getExpandedValue(ctxB)
                                                ctxB["token:USER"] = "user2"
                                                → expandedValue = "/home/user2/*"

wildcardPathElements =                        wildcardPathElements =
  ["", "home", "user1", "*"]                    ["", "home", "user2", "*"]
  ▲ write to shared field                       ▲ overwrites A's array

function.apply(
  "/home/user1/tmp/sales.db",
  "/home/user1/*",              ← A's expandedValue is still correct (local var)
  wildcardPathElements)         ← but reads ["", "home", "user2", "*"]  WRONG

isRecursiveWildCardMatch compares:
  request segment[2] = "user1"
  policy  segment[2] = "user2"   ← from B's overwrite
  → mismatch at index 2
  → return false
  → isAllowed = false            ✗  (expected true for user1)

Why expandedValue being local wasn't enough: expandedValue is a local String and stays correct per thread, but isRecursiveWildCardMatch() uses both the full wildcardPath string and the wildcardPathElements array for the optimized segment walk. When the array is corrupted by another thread, the segment comparison fails even though the string /home/user1/* on the stack is right.

Why it is intermittent: This is a data race — it only fails when Thread B writes wildcardPathElements between Thread A's write and Thread A's function.apply(). Single-threaded runs and low-contention local tests usually pass; CI full-suite load (agents-common, 502 tests, ForkJoin pool) hits it more often.


After fix — local pathElements per isMatch() call

boolean isMatch(String resourceValue, Map<String, Object> evalContext) {
    String   expandedValue;
    String[] pathElements;                        // local variable: one per call, per thread stack

    if (getNeedsDynamicEval()) {
        expandedValue = getExpandedValue(evalContext);
        pathElements  = StringUtils.split(expandedValue, pathSeparatorChar);  // no shared write
    } else {
        expandedValue = value;
        pathElements  = wildcardPathElements;     // static paths: read-only cache from init
    }
    return function.apply(resourceValue, expandedValue, pathSeparatorChar, ioCase, pathElements);
}

Static paths (no {USER}): unchanged — wildcardPathElements is still computed once in the constructor and reused read-only.

Dynamic paths ({USER}): split result lives on the thread stack for the duration of one isMatch() call; the instance field is never written during evaluation.

Shared object: RecursiveWildcardResourceMatcher for policy "/home/{USER}/*"
               instance field wildcardPathElements = null (never written for dynamic paths)

Thread A (user1)                              Thread B (user2)
─────────────────────────────────────────────────────────────────────────────
expandedValue_A = "/home/user1/*"             expandedValue_B = "/home/user2/*"

pathElements_A =                              pathElements_B =
  ["", "home", "user1", "*"]                    ["", "home", "user2", "*"]
  (stack-local)                                 (stack-local)
  A cannot see B's array                        B cannot see A's array

function.apply(..., pathElements_A)           function.apply(..., pathElements_B)

isRecursiveWildCardMatch:                      isRecursiveWildCardMatch:
  "user1" == "user1" ✓                            "user1" vs "user2" ✗
  under /home/user1/* ✓                           not under /home/user2/*
  → MATCH → isAllowed=true ✓                      → NO MATCH → isAllowed=false ✓

No shared mutable state between threads during the match.


Static vs dynamic paths — which failed?

The CI failure was on a dynamic path only. Static paths in the same test file were never affected.

Failed sub-test (dynamic path — policy id 2):

"resources": { "path": { "values": ["/home/{USER}/"], "isRecursive": true } },
"policyItems": [ { "users": ["{USER}"], "accesses": [ { "type": "read" } ] } ]
  • {USER} is a runtime token → getNeedsDynamicEval() returns true
  • Path is expanded per request from context["token:USER"]:
    • user1 → /home/user1/*
    • user2 → /home/user2/*
  • Split segments were written to the shared instance field wildcardPathElements on every isMatch() call → race under parallel evaluation
  • This is the sub-test that failed: ALLOW 'read /home/user1/tmp/sales.db' for user=user1

Passing sub-test in same file (static path — policy id 1):

"resources": { "path": { "values": ["/finance/rest*ricted/"], "isRecursive": true } },
"policyItems": [ { "groups": ["finance"], "accesses": [ { "type": "read" } ] } ]
  • No {USER} or other tokens → getNeedsDynamicEval() returns false
  • Segments split once in the constructor and stored in wildcardPathElements read-only
  • Never rewritten during isMatch()no race, unaffected by this bug
Static path Dynamic path ({USER})
Example in hdfs_resourcespec /finance/rest*ricted/ (policy id 1) /home/{USER}/ (policy id 2)
getNeedsDynamicEval() false true
When segments are split Once at matcher init Every isMatch() call
Where segments stored (before fix) Instance field, write-once Instance field, rewritten per request
CI failure? No Yes (intermittent under parallelStream)
Fix changes this path? No — still uses init-time cache Yes — uses local pathElements per call

Behaviour impact

Scenario Before After
Single-threaded evaluation Correct Correct (identical logic)
Static paths (no tokens) Correct Correct (same init-time cache)
Concurrent {USER} evaluation Intermittently wrong Correct

No intentional semantic change — only removes incorrect cross-thread corruption of split path segments on dynamic paths.

File: agents-common/src/main/java/org/apache/ranger/plugin/resourcematcher/RangerPathResourceMatcher.java


Test plan

  • mvn test -pl agents-common -Dtest=TestPolicyEngine#testPolicyEngine_hiveForTag_filebased
  • mvn test -pl agents-common -Dtest=TestPolicyEngine#testPolicyEngine_hdfs_resourcespec (30 consecutive runs)
  • mvn test -pl agents-common -Dtest=RangerPathResourceMatcherTest
  • CI build-17 green on agents-common

Related

  • Split from #1017 (RANGER-5645) per review: tag fixture date updates belong in a separate Jira/PR.

RESTRICTED-FINAL deny-exception uses isAccessedBefore(activation_date).
Fixture dates were 2026/06/15, so TestPolicyEngine_hiveForTag_filebased
failed on/after that day. Use 2099/12/31 in tag test JSON so CI stays
stable without changing policy semantics under test.

Co-authored-by: Cursor <cursoragent@cursor.com>
RecursiveWildcardResourceMatcher reused a shared wildcardPathElements
field when expanding {USER} tokens, causing intermittent deny results
under concurrent policy evaluation (e.g. hdfs_resourcespec in CI).

Co-authored-by: Cursor <cursoragent@cursor.com>
@ramackri ramackri merged commit 38d2153 into master Jun 15, 2026
11 of 12 checks passed
ramackri pushed a commit to ramackri/ranger that referenced this pull request Jun 16, 2026
PR apache#1018 updated slash-format tag fixture dates to 2099/12/31 but left
2026-06-15 ISO expiry_date values in test_policyengine_tag_hive.json.
After 2026-06-15, TestPolicyEngine.testPolicyEngine_hiveForTag fails CI
with isAllowed expected true but was false for EXPIRES_ON SELF match.

Co-authored-by: Cursor <cursoragent@cursor.com>
ramackri added a commit that referenced this pull request Jun 16, 2026
…1021)

PR #1018 updated slash-format tag fixture dates to 2099/12/31 but left
2026-06-15 ISO expiry_date values in test_policyengine_tag_hive.json.
After 2026-06-15, TestPolicyEngine.testPolicyEngine_hiveForTag fails CI
with isAllowed expected true but was false for EXPIRES_ON SELF match.

Co-authored-by: ramk <ramk@cloudera.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants