Feat/add validations#106
Merged
Merged
Conversation
- Add explicit CSV guard to the post_conversion filter in §2.1 decision diagram, making it consistent with the §1.2 prose that states post_conversion has no effect for .xbrl files. - Replace imprecise claim in §2.2 that "sections 2.8–2.14 are executed" with accurate description: only rules marked Post-conv. = Yes survive (sections 2.11–2.14), while entity (§2.8), decimals (§2.9), and currency (§2.10) are also skipped. Closes #62, Closes #63 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Define the technical architecture for the xbridge validation module: - Rule registry schema (registry.json) with format-specific overrides - Module structure with 22 rule implementation files - Core components: models, registry, engine, context - Rule selection logic matching the specification decision diagram - Public API (validate function) and integration points - Full rule coverage summary (98 unique rules) Companion to validation_specification.md and validations_enumeration.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Revised version and date to reflect the latest draft. - Enhanced rule attributes section to clarify execution conditions. - Organized rules into XML Instance and CSV Report Package categories. - Updated descriptions and added EBA references for various rules. - Improved clarity and consistency in rule formatting and structure.
Implement the core data classes for the validation module: - Severity enum (ERROR, WARNING, INFO) - RuleDefinition with format-specific severity/message overrides - ValidationResult matching specification §1.5 Includes 17 passing tests, ruff clean, mypy strict clean. Closes #65 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement the registry module linking JSON rule definitions to Python implementation functions via a decorator pattern: - load_registry() reads and parses registry.json - @rule_impl decorator registers implementation functions - get_rule_impl() resolves implementations with format-specific priority - Initial registry.json with XML-001 entry Includes 12 passing tests, ruff clean, mypy strict clean. Closes #66 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement the context object passed to every rule implementation: - Carries rule_set, rule_definition, file_path, raw_bytes, and parsed instances (xml_instance, csv_instance, module) - add_finding() renders message templates with format_map and gracefully handles missing placeholders - Respects format-specific severity/message overrides from registry Includes 10 passing tests, ruff clean, mypy strict clean. Closes #67 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement the orchestration layer tying registry, context, and rules: - select_rules() filters by format, EBA flag, and post-conversion - run_validation() detects format, loads registry, parses input, resolves taxonomy module, and executes rule implementations - Graceful handling of parse failures and missing implementations Includes 16 passing tests, ruff clean, mypy strict clean. Closes #68 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add XML-001 well-formedness check (xml_wellformedness.py) - Add `xbridge validate` CLI subcommand with --eba, --post-conversion, --json flags - Update CLI documentation with validate command usage and examples - Fix test setup_method pattern to avoid double-registration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add XML-002 rule that verifies the XML declaration encoding is UTF-8 (case-insensitive). Files without an explicit encoding attribute pass since UTF-8 is the XML default. This is an EBA-only rule (ref §1.4). Closes #70 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add XML-003 rule that verifies the root element is
{http://www.xbrl.org/2003/instance}xbrl. Skips silently on malformed
XML (XML-001 handles that). Non-EBA rule, always runs.
Closes #71
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
XML-010: Exactly one link:schemaRef element MUST be present. XML-012: The schemaRef MUST resolve to a known entry point URL. Closes #72 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ng indicator structural checks XML-020: At least one find:fIndicators element MUST be present. XML-021: At least one filing indicator MUST exist. XML-025: No duplicate filing indicators. XML-026: Filing indicator contexts MUST NOT contain segment or scenario. Closes #73 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Validates that filing indicator codes match known table codes from the module JSON. Uses ctx.module.tables to build the set of valid codes, avoiding direct taxonomy access. Closes #74 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add six EBA context validation rules: - XML-030: period dates must be xs:date (no dateTime/timezone) - XML-031: all periods must be instants (not durations) - XML-032: all periods must share the same reference date - XML-033: all entity identifiers must be identical across contexts - XML-034: xbrli:segment must not be used - XML-035: xbrli:scenario children must be dimension members only Performance: reuses already-parsed lxml tree from ctx.xml_instance.root when available, avoiding redundant XML parsing across all six rules. Closes #75 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move XML parsing into the engine: xml_root is computed once and passed to every ValidationContext via a new xml_root attribute. All rule modules (xml_wellformedness, xml_root_element, xml_schema_ref, xml_filing_indicators, xml_context) now use ctx.xml_root instead of calling etree.fromstring() independently. Before: XML parsed N times (once per rule that needs the tree). After: XML parsed exactly once in run_validation(), shared across all rules. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add four fact validation rules: - XML-040: @precision must not be used (use @dECIMALS) [EBA] - XML-041: @dECIMALS value must be valid integer or "INF" [non-EBA] - XML-042: @xsi:nil must not be used on facts [EBA] - XML-043: string-type facts must not be empty [EBA] Facts are identified as direct root children not in infrastructure namespaces (xbrli, link, find) using a frozenset for O(1) lookup. All rules use ctx.xml_root (single-parse architecture). Closes #76 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Single-pass scan architecture: one iteration over every element collects prohibited elements/attributes (XML-060..064), contextRef/ unitRef inventories (XML-066, 068), and context/unit elements for duplicate detection (XML-067, 069). Result is cached per root so all 10 rules reuse the same scan. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…hecks XML-070: fact concepts must be defined in the module taxonomy XML-071: explicit dimension QNames must be defined in the taxonomy XML-072: dimension member values must be valid for their dimension Uses cached taxonomy extraction from Module and single-pass XML scan. Open key dimensions skip member validation. Supports both datapoints and headers architecture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Variable.from_dict strips namespace prefixes from dimension keys (e.g., "eba_dim:BAS" → "BAS"). The taxonomy extraction now handles both prefixed and bare localname keys, matching dimensions by localname only. Member values retain their prefix and are still resolved via namespace URI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Regression tests that construct variables through Variable.from_dict (the real production deserialization path) to ensure dimension key prefix stripping does not break taxonomy validation. Covers: concept resolution, dimension matching with stripped prefixes, member validation, and versioned prefixes (e.g., eba_dim_4.0:BAS). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add *.skill to .gitignore - Remove unused import and sort imports in test_validation_engine.py - Add submission package naming rules (section 3) to validations spec Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dentifier checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…urrency checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… unit checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… additional checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add EBA-NAME-071: the root folder inside a CSV report package ZIP must match the ZIP filename stem. Also fixes _FRAMEWORK_VERSION_RE regex to accept PILLAR3-style framework codes, and adds smoke tests proving EBA-NAME-001..060 already work for CSV via format-agnostic dispatch. Closes #102 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce a shared_cache dict on ValidationContext, created once per run_validation() call and reused across all rules. This eliminates ~100 redundant ZIP opens per file by caching parsed data (namelist, report.json, parameters.csv, FilingIndicators.csv, data tables, variable lookup, namespace map, zip root prefix). Additional optimisations: - Skip reading CSV ZIP bytes into memory (unused by CSV rules) - Cache module index at module level - Centralise duplicated _build_variable_lookup into _helpers.py - Remove redundant second ZipFile.extractall in CsvInstance.parse() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bump version to 2.0.0rc2. Adds full CSV-side validation (structural rules CSV-001..CSV-062, EBA rules for entity, decimals, units, currency, guidance, and naming). Includes shared-cache performance optimisation eliminating ~100 redundant ZIP opens per file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Non-monetary facts (pure unit, no unit) in a denomination context are valid non-currency metrics (e.g. percentages, counts) and should not be flagged by the currency-of-denomination rule. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add validation sections to README.rst covering CLI usage (xbridge validate subcommand with --eba, --post-conversion, --json options) and Python API (validate() function with dict-based return format examples). Fix docs/validation.rst to accurately reflect the dict-based return format of validate() instead of the outdated ValidationResult object-style API. Update docs/index.rst What's New section from 1.5.x to 2.0.0rc2/rc1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a combined pipeline that validates an XBRL-XML file before converting, then validates the resulting CSV post-conversion. Available as both CLI flags (--validate, --eba) and Python API parameters (validate=, eba=), defaulting to off. - Add ValidationError exception with results/path attributes - Add validate/eba parameters to convert_instance() - Add --validate and --eba flags to convert CLI command - Update README.rst and docs/cli.rst with new options - Add pipeline tests (9 tests covering all branches) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nd breaking changes
…'false', '1', or '0', correct CSV-025 to ensure that only reported datapoints are considered.
…ns and corrections
…R-era values in reportPackage.json and report.json, so converted files now pass CSV-003 and CSV-011 validation.
Scenario.parse() crashed with IndexError on dimension attributes lacking
a colon (e.g. dimension="qCAA"), which silently prevented XmlInstance
from loading and caused all taxonomy-based validation rules (XML-070/
071/072) to be skipped.
- Fix split logic in Scenario.parse() to handle unprefixed dimensions
- Add fallback module_ref extraction in the validation engine so
taxonomy rules can still run when XmlInstance parsing fails
…nd documentation updates
This commit introduces a new documentation file detailing the validation rules supported by xbridge. The rules are categorized into XML Instance Rules, CSV Report Package Rules, and Submission Package Naming Rules, each with unique identifiers, severity levels, and descriptions. The documentation also includes attributes controlling rule execution, input format detection, and a summary of rule coverage across formats. This enhancement aims to provide clear guidance for users on compliance and validation requirements.
…entation and code comments
…leanup. Ensures validity for Windows.
…or specific error message URL
javihern98
approved these changes
Mar 18, 2026
javihern98
left a comment
Contributor
There was a problem hiding this comment.
Looks good, thanks! 😊
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add a comprehensive validation engine to xbridge that checks XBRL instance files (both XML and CSV formats) against 90+ structural and EBA regulatory rules. This includes a standalone
validation API, CLI integration, a validate-convert-validate pipeline, and full test coverage. The version is bumped to 2.0.0.
Type of Change
Related Issues
Closes #
Related #
Changes Made
xbridge.validationmodule with registry, context, engine, and models — rule-based architecture that discovers and executes validation functions by code.(XML-040..043), unit UTR reference (XML-050), document-level checks (XML-060..069), and taxonomy conformance (XML-070..072).
data table checks (CSV-040..049), fact-level checks (CSV-050..052), and taxonomy conformance (CSV-060..062).
conventions (EBA-NAME), and supplementary regulatory checks (EBA-2.x).
validatesubcommand with--eba,--post-conversion, and--jsonflags. New--validateand--ebaflags on theconvertcommand forvalidate-convert-validate pipeline. - Converter fixes: Updated
reportPackage.jsonandreport.jsonto use final XBRL specification URLs (passes CSV-003 and CSV-011).finrep9dpmodule support.docs/validation.rstanddocs/validation_rules.rstwith API reference, usage examples, and complete rule catalog aligned to EBA Filing Rules v5.8.Testing
Tests Added
Testing Performed
Tests cover all validation rule modules, the engine, registry, models, context, API, and the validate-convert-validate pipeline.