HPCC-32908 Initial stab at implementing a IPropertyTree visitor#21054
HPCC-32908 Initial stab at implementing a IPropertyTree visitor#21054ghalliday wants to merge 19 commits into
Conversation
…IPropertyTree - Add VisitResult enum class (Continue, Stop, SkipChildren) to jptree.hpp - Add IPropertyTreeVisitor interface with single visit(node, attrName) method where attrName is nullptr for element matches and the attribute name when the xpath tail resolves to an attribute - Add pure virtual visit() declaration to IPropertyTree interface - Add visit() override declaration to PTree in jptree.ipp
Add PTree::visit(const char *xpath, IPropertyTreeVisitor &) which walks all
nodes matching the xpath and calls visitor.visit() for each match without
creating any heap-allocated IPropertyTreeIterator objects.
Xpath dispatch mirrors getElements():
- null/empty -> visit this node
- . -> self reference, continue with remainder
- / -> separator, skip; error if root
- // -> recursive descent: match at this level then descend
into all children via visitAllChildren()
- [N] -> numeric index into CPTArray
- [expr] -> checkPattern() qualifier on this node
- tag / tag* -> ChildMap lookup (exact or wildcard), CPTArray inline
expansion, optional qualifier via checkPattern(),
recurse on remainder
- attr tail -> pass attrName to visitor.visit() instead of nullptr
Add two private static PTree helpers (declared in jptree.ipp):
- PTree::visitChildContainer() - expands CPTArray or visits single element
- PTree::visitAllChildren() - iterates ChildMap for // descent
VisitResult::Stop propagates immediately up the call stack.
VisitResult::SkipChildren is respected at the leaf visitNode level by not
recursing further (the caller controls recursion via the return value).
Note: checkPattern() still uses getElements() internally for complex
qualifiers ([childTag], [childTag=val]); this is acceptable for v1 and
will be addressed in a future stage using nested visit() calls.
SkipChildren fix: SkipChildren is now consumed at the level where it is returned, preventing it from propagating up to a sibling loop and incorrectly halting iteration. Fixed in visitChildContainer (array and single-element paths), in the [N] index case inside visit(), and in the visitContainer lambda array loop. Only Stop is allowed to propagate upward across container/sibling boundaries. Stage 4 comment (in checkPattern): Documents how the two getElements()/getElements() iterator patterns inside checkPattern() can be replaced with nested visit() calls using lightweight boolean/comparison callback visitors, eliminating all iterator allocation from qualifier evaluation. Stage 5 comment (before PTree::visit): Documents the XPathSegments pre-parsed structure design: parse xpath once into a vector of XPathSegment values, then walk by segment index on recursive calls to eliminate repeated string scanning for child nodes. Includes the proposed struct layout, parseXPath() signature, and the single-segment fast-path that avoids any allocation.
…to non-static members
Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
There was a problem hiding this comment.
Pull request overview
Introduces an IPropertyTree::visit() API and supporting visitor interface to traverse IPropertyTree nodes matching an xpath, with internal optimizations to avoid iterator allocations for common cases.
Changes:
- Added
VisitResultandIPropertyTreeVisitor, plus a newIPropertyTree::visit()method. - Implemented
PTree::visit()with helpers for tags, qualifiers, wildcards, and//recursive descent. - Added
ChildMaphelpers to iterate buckets directly (wildcard/all) without allocatingIPropertyTreeIterator.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| system/jlib/jptree.hpp | Adds visitor types (VisitResult, IPropertyTreeVisitor) and the new IPropertyTree::visit() API. |
| system/jlib/jptree.ipp | Declares visit() in PTree and adds private helper declarations plus ChildMap visit helpers. |
| system/jlib/jptree.cpp | Implements traversal logic (PTree::visit*, ChildMap::visit*) and refactors qualifier matching via MatchVisitor. |
| if (qualifier) | ||
| { | ||
| const char *q = qualifier + 1; // skip '[' | ||
| if (isdigit(*q)) | ||
| { | ||
| // Numeric [N] on an array is handled in the outer [N] case of visit(); | ||
| // reaching here means no numeric qualifier applies per-element. | ||
| break; | ||
| } |
There was a problem hiding this comment.
The numeric qualifier [N] on a tag segment that resolves to a CPTArray will result in visiting no elements because the loop breaks. Since visitWithTag() forwards the qualifier to visitMatchedNode() (and does not otherwise handle [N] on that segment), foo[3]-style xpaths against arrays won’t work. Parse the numeric index here and visit only the selected element (1-based), then continue evaluation on remainder and propagate Stop if requested.
| // Scan past all chained qualifiers to find where the real remainder starts | ||
| const char *scan = remainder + 1; | ||
| for (;;) | ||
| { | ||
| char quote = 0; | ||
| while (']' != *scan || quote) | ||
| { | ||
| switch (*scan) | ||
| { | ||
| case '\"': | ||
| case '\'': | ||
| if (quote) | ||
| { | ||
| if (*scan == quote) quote = 0; | ||
| } | ||
| else | ||
| quote = *scan; | ||
| break; | ||
| case '\0': | ||
| throw MakeXPathException(_xpath, PTreeExcpt_XPath_ParseError, scan-_xpath, "Qualifier brace unclosed"); | ||
| } | ||
| ++scan; | ||
| } | ||
| ++scan; // skip ']' | ||
| if ('[' != *scan) | ||
| break; | ||
| ++scan; // skip '[' of next qualifier | ||
| } |
There was a problem hiding this comment.
This code explicitly supports chained qualifiers (e.g. foo[@a][@b]/...) when computing remainder, but only a single qualifier pointer (to the first [) is forwarded into visitMatchedNode(). As a result, only the first qualifier is evaluated and subsequent qualifiers are ignored. Consider passing the full qualifier span (from first [ up to remainder) and iterating through each [...] in visitMatchedNode() (or adding a helper that applies all qualifiers sequentially) so chained qualifiers behave as expected.
| // Scan past all chained qualifiers to find where the real remainder starts | |
| const char *scan = remainder + 1; | |
| for (;;) | |
| { | |
| char quote = 0; | |
| while (']' != *scan || quote) | |
| { | |
| switch (*scan) | |
| { | |
| case '\"': | |
| case '\'': | |
| if (quote) | |
| { | |
| if (*scan == quote) quote = 0; | |
| } | |
| else | |
| quote = *scan; | |
| break; | |
| case '\0': | |
| throw MakeXPathException(_xpath, PTreeExcpt_XPath_ParseError, scan-_xpath, "Qualifier brace unclosed"); | |
| } | |
| ++scan; | |
| } | |
| ++scan; // skip ']' | |
| if ('[' != *scan) | |
| break; | |
| ++scan; // skip '[' of next qualifier | |
| } | |
| // Scan past the first qualifier to find where the remainder starts | |
| const char *scan = remainder + 1; | |
| char quote = 0; | |
| while (']' != *scan || quote) | |
| { | |
| switch (*scan) | |
| { | |
| case '\"': | |
| case '\'': | |
| if (quote) | |
| { | |
| if (*scan == quote) | |
| quote = 0; | |
| } | |
| else | |
| quote = *scan; | |
| break; | |
| case '\0': | |
| throw MakeXPathException(_xpath, PTreeExcpt_XPath_ParseError, scan-_xpath, "Qualifier brace unclosed"); | |
| } | |
| ++scan; | |
| } | |
| // 'scan' now points at the closing ']'; move remainder to first char after it | |
| ++scan; |
| // NULL or empty xpath — visit this node directly (no further navigation) | ||
| if (nullptr == xpath || '\0' == *xpath) | ||
| return visitor.visit(*this, nullptr); | ||
| if ('@' == *xpath) | ||
| return visitor.visit(*this, xpath); |
There was a problem hiding this comment.
PTree::visit() can return VisitResult::SkipChildren directly to callers in these base cases, but the new IPropertyTree::visit() contract in the header comment says it returns Stop for early stop and otherwise Continue. If SkipChildren is intended to be purely an internal traversal control, normalize it to Continue at the API boundary (including these base cases) to match the documented behavior.
| // attrName - nullptr for element matches; the attribute name (e.g. "@foo") when the | ||
| // xpath tail resolves to an attribute | ||
| interface jlib_decl IPropertyTreeVisitor | ||
| { |
There was a problem hiding this comment.
IPropertyTreeVisitor is a polymorphic interface but does not declare a virtual destructor. If a visitor is ever deleted via an IPropertyTreeVisitor*, that would be undefined behavior. Add a virtual ~IPropertyTreeVisitor() = default; (or an equivalent protected virtual destructor) to make the interface safe for typical C++ usage.
| { | |
| { | |
| virtual ~IPropertyTreeVisitor() = default; |
| // MatchVisitor encapsulates the parsing and evaluation of a single qualifier expression | ||
| // of the form used inside [...] in an xpath. Call matches() once; it parses the expression | ||
| // from xxpath, evaluates it against the supplied node, advances xxpath past the closing ']', | ||
| // and returns true if the node satisfies the qualifier. |
There was a problem hiding this comment.
The comment says matches() “advances xxpath past the closing ']'”, but the implementation sets path = xpath; where xpath is positioned at ']' (and callers like visitWithQualifier() explicitly check ']' after checkPattern()). Update the comment to reflect the actual contract (i.e., leaves the pointer at ']' for the caller to consume), or adjust the code to match the documented behavior.
|
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32908 Jirabot Action Result: |
| // If the remainder is just '/' with nothing after, treat as empty | ||
| if ('/' == *remainder && '\0' == *(remainder+1)) | ||
| remainder = remainder + 1; | ||
|
|
||
| if (!checkChildren()) | ||
| return VisitResult::Continue; | ||
|
|
||
| if (wild) | ||
| { | ||
| // Wildcard — walk every bucket in the ChildMap directly (no iterator allocation) | ||
| VisitResult r = children->visitWild(id, isnocase(), qualifier, remainder, visitor); | ||
| if (r == VisitResult::Stop) | ||
| return VisitResult::Stop; | ||
| } | ||
| else | ||
| { | ||
| // Exact name lookup | ||
| IPropertyTree *child = children->query(id); | ||
| if (child) | ||
| { | ||
| VisitResult r = static_cast<PTree *>(child)->visitMatchedNode(qualifier, remainder, visitor); | ||
| if (r == VisitResult::Stop) | ||
| return VisitResult::Stop; | ||
| } | ||
| } |
There was a problem hiding this comment.
visitWithTag() passes remainder through to deeper visit() calls including the leading '/' separator for multi-segment xpaths (e.g. "a/b" produces remainder "/b"). In PTree::visit(), a leading '/' with root==true throws "Root specifier "/" is not supported", so normal multi-segment paths will fail. Fix by stripping a single leading '/' from remainder before recursing (but preserve the '//' case), and apply the same normalization for remainders produced after qualifier parsing.
| return r; | ||
| } | ||
| else if (root) | ||
| throw MakeXPathException(xpath, PTreeExcpt_XPath_Unsupported, 0, "Root specifier \"/\" is not supported"); |
There was a problem hiding this comment.
visitWithTag() passes remainder through to deeper visit() calls including the leading '/' separator for multi-segment xpaths (e.g. "a/b" produces remainder "/b"). In PTree::visit(), a leading '/' with root==true throws "Root specifier "/" is not supported", so normal multi-segment paths will fail. Fix by stripping a single leading '/' from remainder before recursing (but preserve the '//' case), and apply the same normalization for remainders produced after qualifier parsing.
| throw MakeXPathException(xpath, PTreeExcpt_XPath_Unsupported, 0, "Root specifier \"/\" is not supported"); | |
| { | |
| // Treat a single leading '/' at the root as a normal path separator. | |
| // This avoids rejecting multi-segment paths where recursion passes | |
| // a remainder like "/b" into visit(), while preserving the '//' case above. | |
| root = false; | |
| if ('\0' == *xpath) | |
| return visitor.visit(*this, nullptr); | |
| goto restart; | |
| } |
|
|
||
| if (']' == *q) q++; // skip ']' | ||
| } | ||
| else | ||
| matched = elem->newCheckPattern(q); // advances q past ']' |
There was a problem hiding this comment.
MatchVisitor::matches() claims (and call sites assume) it advances the input pointer past the closing ], but it currently assigns path = xpath without consuming the ]. This breaks chained qualifiers for non-numeric expressions (e.g. tag[@a][@b]), because q won't advance to the next '[' and the loop will stop after the first qualifier. Make pointer advancement consistent: either (a) have matches() consume the closing ] before updating path, or (b) update all callers/comments to explicitly consume ] and continue scanning for the next qualifier.
| if (']' == *q) q++; // skip ']' | |
| } | |
| else | |
| matched = elem->newCheckPattern(q); // advances q past ']' | |
| if (']' == *q) q++; // skip ']' | |
| } | |
| else | |
| { | |
| matched = elem->newCheckPattern(q); // evaluate non-numeric qualifier | |
| // Ensure we always consume the closing ']' before looking for the next qualifier | |
| if (matched && ']' == *q) | |
| q++; // skip ']' | |
| } |
| // of the form used inside [...] in an xpath. Call matches() once; it parses the expression | ||
| // from xxpath, evaluates it against the supplied node, advances xxpath past the closing ']', | ||
| // and returns true if the node satisfies the qualifier. |
There was a problem hiding this comment.
MatchVisitor::matches() claims (and call sites assume) it advances the input pointer past the closing ], but it currently assigns path = xpath without consuming the ]. This breaks chained qualifiers for non-numeric expressions (e.g. tag[@a][@b]), because q won't advance to the next '[' and the loop will stop after the first qualifier. Make pointer advancement consistent: either (a) have matches() consume the closing ] before updating path, or (b) update all callers/comments to explicitly consume ] and continue scanning for the next qualifier.
| else | ||
| visit(node, nullptr); | ||
| } | ||
|
|
There was a problem hiding this comment.
MatchVisitor::matches() claims (and call sites assume) it advances the input pointer past the closing ], but it currently assigns path = xpath without consuming the ]. This breaks chained qualifiers for non-numeric expressions (e.g. tag[@a][@b]), because q won't advance to the next '[' and the loop will stop after the first qualifier. Make pointer advancement consistent: either (a) have matches() consume the closing ] before updating path, or (b) update all callers/comments to explicitly consume ] and continue scanning for the next qualifier.
| if (*xpath == ']') | |
| xpath++; |
| if (']' != *xpath) | ||
| throw MakeXPathException(_xpath, PTreeExcpt_XPath_ParseError, xpath-_xpath, "Qualifier brace unclosed"); |
There was a problem hiding this comment.
The error text says "Qualifier brace unclosed" but the syntax uses brackets [...]. Consider changing "brace" to "bracket" to make the message more precise and consistent with the XPath syntax being parsed.
| quote = *scan; | ||
| break; | ||
| case '\0': | ||
| throw MakeXPathException(_xpath, PTreeExcpt_XPath_ParseError, scan-_xpath, "Qualifier brace unclosed"); |
There was a problem hiding this comment.
The error text says "Qualifier brace unclosed" but the syntax uses brackets [...]. Consider changing "brace" to "bracket" to make the message more precise and consistent with the XPath syntax being parsed.
| throw MakeXPathException(_xpath, PTreeExcpt_XPath_ParseError, scan-_xpath, "Qualifier brace unclosed"); | |
| throw MakeXPathException(_xpath, PTreeExcpt_XPath_ParseError, scan-_xpath, "Qualifier bracket unclosed"); |
| { | ||
| // Multiple siblings in a CPTArray: check qualifier per-element, then recurse | ||
| unsigned n = val->elements(); | ||
| unsigned indexCounts[8] = {0}; // Support up to 8 chained numeric qualifiers |
There was a problem hiding this comment.
The limit 8 for chained numeric qualifiers is a magic number duplicated in multiple places. Consider introducing a named constant (e.g. constexpr unsigned maxChainedIndexQualifiers = 8;) to centralize the limit and avoid accidental divergence if it needs to change.
| StringAttr idxstr; | ||
| q = readIndex(q, idxstr); | ||
| unsigned requiredIdx = atoi(idxstr.get()); | ||
| if (numQualIndex < 8) |
There was a problem hiding this comment.
The limit 8 for chained numeric qualifiers is a magic number duplicated in multiple places. Consider introducing a named constant (e.g. constexpr unsigned maxChainedIndexQualifiers = 8;) to centralize the limit and avoid accidental divergence if it needs to change.
e724e28 to
cc22aa6
Compare
Type of change:
Checklist:
Smoketest:
Testing: