Feat/add validations by antonio-olleros · Pull Request #106 · Meaningful-Data/xbridge

antonio-olleros · 2026-03-18T10:32:18Z

Description

Add a comprehensive validation engine to xbridge that checks XBRL instance files (both XML and CSV formats) against 90+ structural and EBA regulatory rules. This includes a standalone
validation API, CLI integration, a validate-convert-validate pipeline, and full test coverage. The version is bumped to 2.0.0.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Performance improvement
Code refactoring
Dependency update
Other (please describe):

Related Issues

Closes #
Related #

Changes Made

Validation engine: New xbridge.validation module with registry, context, engine, and models — rule-based architecture that discovers and executes validation functions by code.
XML validation rules (30+): Well-formedness (XML-001..003), schemaRef (XML-010/012), filing indicators (XML-020..026), context structure (XML-030..035), fact structure
(XML-040..043), unit UTR reference (XML-050), document-level checks (XML-060..069), and taxonomy conformance (XML-070..072).
CSV validation rules (30+): Report package structure (CSV-001..006), report.json metadata (CSV-010..016), parameters.csv (CSV-020..026), FilingIndicators.csv (CSV-030..035),
data table checks (CSV-040..049), fact-level checks (CSV-050..052), and taxonomy conformance (CSV-060..062).
EBA-specific rules (25+): Entity identifier (EBA-ENTITY), currency (EBA-CUR), units (EBA-UNIT), decimals accuracy (EBA-DEC), guidance compliance (EBA-GUIDE), file naming
conventions (EBA-NAME), and supplementary regulatory checks (EBA-2.x).
CLI integration: New validate subcommand with --eba, --post-conversion, and --json flags. New --validate and --eba flags on the convert command for
validate-convert-validate pipeline. - Converter fixes: Updated reportPackage.json and report.json to use final XBRL specification URLs (passes CSV-003 and CSV-011).
EBA Taxonomy 4.2.1: Added finrep9dp module support.
Documentation: New docs/validation.rst and docs/validation_rules.rst with API reference, usage examples, and complete rule catalog aligned to EBA Filing Rules v5.8.

Testing

Tests Added

Unit tests
Integration tests
Test coverage maintained or improved

Testing Performed

Tests cover all validation rule modules, the engine, registry, models, context, API, and the validate-convert-validate pipeline.

pytest tests/                                             

Test results:                                                                                                                                                                          
- All existing tests pass
- New tests pass                                                                                                                                                                       
- Manual testing performed                                

Documentation                                                                                                                                                                          

- Updated docstrings                                                                                                                                                                   
- Updated README.md                                       
- Updated documentation in docs/                                                                                                                                                       
- Updated CHANGELOG.md (added entry under "Unreleased")                                                                                                                                
- No documentation needed for this change                                                                                                                                              
                                                                                                                                                                                       
Code Quality                                                                                                                                                                           
                                                                                                                                                                                       
- Code follows the project's style guidelines (Ruff)                                                                                                                                   
- Ran ruff check and ruff format
- Ran mypy type checking                                                                                                                                                               
- Self-review of code completed                                                                                                                                                        
- Comments added for complex/non-obvious code                                                                                                                                          
- No new warnings generated                                                                                                                                                            
                                                                                                                                                                                       
Breaking Changes                                                                                                                                                                       
                                                                                                                                                                                       
Impact:                                                                                                                                                                                
- Converter output URLs updated to final XBRL spec URLs — regenerated CSV packages will differ from previous output.                                                                              
                                                                                                                                                                                       
Migration guide:                                                                                                                                                                       
- Re-run conversions if downstream tooling compares output ZIPs byte-for-byte.                                                                                                         
                                                                              
Screenshots (if applicable)                                                                                                                                                            
                                                                                                                                                                                       
N/A                                                                                                                                                                                    
                                                                                                                                                                                       
Checklist                                                 

- My code follows the project's code style                                                                                                                                             
- I have performed a self-review of my code
- I have commented my code, particularly in hard-to-understand areas                                                                                                                   
- I have made corresponding changes to the documentation  
- My changes generate no new warnings                                                                                                                                                  
- I have added tests that prove my fix is effective or that my feature works
- New and existing unit tests pass locally with my changes                                                                                                                             
- Any dependent changes have been merged and published
- I have updated the CHANGELOG.md                                                                                                                                                      
                                                                                                                                                                                       
Additional Notes                                                                                                                                                                       
                                                                                                                                                                                       
This PR contains 78 commits spanning the full validation feature build-out, from initial architecture through all rule implementations, performance optimizations, and release         
candidates up to v2.0.0. Rules are aligned with EBA Filing Rules v5.8.
                                                                                                                                                                                       
Reviewer Notes                                                                                                                                                                         

Areas to focus on:                                                                                                                                                                     
- Validation engine architecture (src/xbridge/validation/_engine.py, _registry.py, _models.py)                                                                                                                                        
- CSV and XML rule correctness, especially taxonomy conformance checks
                                                                                                                                                                                       
Questions for reviewers:

- Add explicit CSV guard to the post_conversion filter in §2.1 decision diagram, making it consistent with the §1.2 prose that states post_conversion has no effect for .xbrl files. - Replace imprecise claim in §2.2 that "sections 2.8–2.14 are executed" with accurate description: only rules marked Post-conv. = Yes survive (sections 2.11–2.14), while entity (§2.8), decimals (§2.9), and currency (§2.10) are also skipped. Closes #62, Closes #63 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Define the technical architecture for the xbridge validation module: - Rule registry schema (registry.json) with format-specific overrides - Module structure with 22 rule implementation files - Core components: models, registry, engine, context - Rule selection logic matching the specification decision diagram - Public API (validate function) and integration points - Full rule coverage summary (98 unique rules) Companion to validation_specification.md and validations_enumeration.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Revised version and date to reflect the latest draft. - Enhanced rule attributes section to clarify execution conditions. - Organized rules into XML Instance and CSV Report Package categories. - Updated descriptions and added EBA references for various rules. - Improved clarity and consistency in rule formatting and structure.

Implement the core data classes for the validation module: - Severity enum (ERROR, WARNING, INFO) - RuleDefinition with format-specific severity/message overrides - ValidationResult matching specification §1.5 Includes 17 passing tests, ruff clean, mypy strict clean. Closes #65 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implement the registry module linking JSON rule definitions to Python implementation functions via a decorator pattern: - load_registry() reads and parses registry.json - @rule_impl decorator registers implementation functions - get_rule_impl() resolves implementations with format-specific priority - Initial registry.json with XML-001 entry Includes 12 passing tests, ruff clean, mypy strict clean. Closes #66 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implement the context object passed to every rule implementation: - Carries rule_set, rule_definition, file_path, raw_bytes, and parsed instances (xml_instance, csv_instance, module) - add_finding() renders message templates with format_map and gracefully handles missing placeholders - Respects format-specific severity/message overrides from registry Includes 10 passing tests, ruff clean, mypy strict clean. Closes #67 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implement the orchestration layer tying registry, context, and rules: - select_rules() filters by format, EBA flag, and post-conversion - run_validation() detects format, loads registry, parses input, resolves taxonomy module, and executes rule implementations - Graceful handling of parse failures and missing implementations Includes 16 passing tests, ruff clean, mypy strict clean. Closes #68 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nsive tests

… tests

- Add XML-001 well-formedness check (xml_wellformedness.py) - Add `xbridge validate` CLI subcommand with --eba, --post-conversion, --json flags - Update CLI documentation with validate command usage and examples - Fix test setup_method pattern to avoid double-registration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add XML-002 rule that verifies the XML declaration encoding is UTF-8 (case-insensitive). Files without an explicit encoding attribute pass since UTF-8 is the XML default. This is an EBA-only rule (ref §1.4). Closes #70 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add XML-003 rule that verifies the root element is {http://www.xbrl.org/2003/instance}xbrl. Skips silently on malformed XML (XML-001 handles that). Non-EBA rule, always runs. Closes #71 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

XML-010: Exactly one link:schemaRef element MUST be present. XML-012: The schemaRef MUST resolve to a known entry point URL. Closes #72 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ng indicator structural checks XML-020: At least one find:fIndicators element MUST be present. XML-021: At least one filing indicator MUST exist. XML-025: No duplicate filing indicators. XML-026: Filing indicator contexts MUST NOT contain segment or scenario. Closes #73 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Validates that filing indicator codes match known table codes from the module JSON. Uses ctx.module.tables to build the set of valid codes, avoiding direct taxonomy access. Closes #74 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add six EBA context validation rules: - XML-030: period dates must be xs:date (no dateTime/timezone) - XML-031: all periods must be instants (not durations) - XML-032: all periods must share the same reference date - XML-033: all entity identifiers must be identical across contexts - XML-034: xbrli:segment must not be used - XML-035: xbrli:scenario children must be dimension members only Performance: reuses already-parsed lxml tree from ctx.xml_instance.root when available, avoiding redundant XML parsing across all six rules. Closes #75 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move XML parsing into the engine: xml_root is computed once and passed to every ValidationContext via a new xml_root attribute. All rule modules (xml_wellformedness, xml_root_element, xml_schema_ref, xml_filing_indicators, xml_context) now use ctx.xml_root instead of calling etree.fromstring() independently. Before: XML parsed N times (once per rule that needs the tree). After: XML parsed exactly once in run_validation(), shared across all rules. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@precision

Add four fact validation rules: - XML-040: @precision must not be used (use @dECIMALS) [EBA] - XML-041: @dECIMALS value must be valid integer or "INF" [non-EBA] - XML-042: @xsi:nil must not be used on facts [EBA] - XML-043: string-type facts must not be empty [EBA] Facts are identified as direct root children not in infrastructure namespaces (xbrli, link, find) using a frozenset for O(1) lookup. All rules use ctx.xml_root (single-parse architecture). Closes #76 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Single-pass scan architecture: one iteration over every element collects prohibited elements/attributes (XML-060..064), contextRef/ unitRef inventories (XML-066, 068), and context/unit elements for duplicate detection (XML-067, 069). Result is cached per root so all 10 rules reuse the same scan. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…hecks XML-070: fact concepts must be defined in the module taxonomy XML-071: explicit dimension QNames must be defined in the taxonomy XML-072: dimension member values must be valid for their dimension Uses cached taxonomy extraction from Module and single-pass XML scan. Open key dimensions skip member validation. Supports both datapoints and headers architecture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Variable.from_dict strips namespace prefixes from dimension keys (e.g., "eba_dim:BAS" → "BAS"). The taxonomy extraction now handles both prefixed and bare localname keys, matching dimensions by localname only. Member values retain their prefix and are still resolved via namespace URI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Regression tests that construct variables through Variable.from_dict (the real production deserialization path) to ensure dimension key prefix stripping does not break taxonomy validation. Covers: concept resolution, dimension matching with stripped prefixes, member validation, and versioned prefixes (e.g., eba_dim_4.0:BAS). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add *.skill to .gitignore - Remove unused import and sort imports in test_validation_engine.py - Add submission package naming rules (section 3) to validations spec Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…dentifier checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…urrency checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… unit checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… additional checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add EBA-NAME-071: the root folder inside a CSV report package ZIP must match the ZIP filename stem. Also fixes _FRAMEWORK_VERSION_RE regex to accept PILLAR3-style framework codes, and adds smoke tests proving EBA-NAME-001..060 already work for CSV via format-agnostic dispatch. Closes #102 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Introduce a shared_cache dict on ValidationContext, created once per run_validation() call and reused across all rules. This eliminates ~100 redundant ZIP opens per file by caching parsed data (namelist, report.json, parameters.csv, FilingIndicators.csv, data tables, variable lookup, namespace map, zip root prefix). Additional optimisations: - Skip reading CSV ZIP bytes into memory (unused by CSV rules) - Cache module index at module level - Centralise duplicated _build_variable_lookup into _helpers.py - Remove redundant second ZipFile.extractall in CsvInstance.parse() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Bump version to 2.0.0rc2. Adds full CSV-side validation (structural rules CSV-001..CSV-062, EBA rules for entity, decimals, units, currency, guidance, and naming). Includes shared-cache performance optimisation eliminating ~100 redundant ZIP opens per file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Non-monetary facts (pure unit, no unit) in a denomination context are valid non-currency metrics (e.g. percentages, counts) and should not be flagged by the currency-of-denomination rule. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add validation sections to README.rst covering CLI usage (xbridge validate subcommand with --eba, --post-conversion, --json options) and Python API (validate() function with dict-based return format examples). Fix docs/validation.rst to accurately reflect the dict-based return format of validate() instead of the outdated ValidationResult object-style API. Update docs/index.rst What's New section from 1.5.x to 2.0.0rc2/rc1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add a combined pipeline that validates an XBRL-XML file before converting, then validates the resulting CSV post-conversion. Available as both CLI flags (--validate, --eba) and Python API parameters (validate=, eba=), defaulting to off. - Add ValidationError exception with results/path attributes - Add validate/eba parameters to convert_instance() - Add --validate and --eba flags to convert CLI command - Update README.rst and docs/cli.rst with new options - Add pipeline tests (9 tests covering all branches) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…xonomy 4.2.1

…A sections

…nd breaking changes

…'false', '1', or '0', correct CSV-025 to ensure that only reported datapoints are considered.

…ns and corrections

…R-era values in reportPackage.json and report.json, so converted files now pass CSV-003 and CSV-011 validation.

Scenario.parse() crashed with IndexError on dimension attributes lacking a colon (e.g. dimension="qCAA"), which silently prevented XmlInstance from loading and caused all taxonomy-based validation rules (XML-070/ 071/072) to be skipped. - Fix split logic in Scenario.parse() to handle unprefixed dimensions - Add fallback module_ref extraction in the validation engine so taxonomy rules can still run when XmlInstance parsing fails

…nhancements

…dd-validations

…nd documentation updates

This commit introduces a new documentation file detailing the validation rules supported by xbridge. The rules are categorized into XML Instance Rules, CSV Report Package Rules, and Submission Package Naming Rules, each with unique identifiers, severity levels, and descriptions. The documentation also includes attributes controlling rule execution, input format detection, and a summary of rule coverage across formats. This enhancement aims to provide clear guidance for users on compliance and validation requirements.

… related tests

…entation and code comments

…guration files

…y.lock

…leanup. Ensures validity for Windows.

…or specific error message URL

javihern98

Looks good, thanks! 😊

antonio-olleros and others added 30 commits February 5, 2026 20:45

First specification draft

52b967d

feat: Enhance validation API with detailed docstring and add comprehe…

cb5b973

…nsive tests

Merge remote-tracking branch 'origin/main' into feat/add-validations

424c34e

feat: Add standalone validation API and XML well-formedness rule with…

cd8e986

… tests

feat(validation): implement XML-010 and XML-012 — schemaRef checks

a426a87

XML-010: Exactly one link:schemaRef element MUST be present. XML-012: The schemaRef MUST resolve to a known entry point URL. Closes #72 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(validation): implement XML-050 — unit UTR reference check

4996158

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: gitignore *.skill, lint fixes, extend validation spec

8fef49b

- Add *.skill to .gitignore - Remove unused import and sort imports in test_validation_engine.py - Add submission package naming rules (section 3) to validations spec Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(validation): implement EBA-ENTITY-001, EBA-ENTITY-002 — entity i…

bb83b50

…dentifier checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(validation): implement EBA-CUR-001, EBA-CUR-002, EBA-CUR-003 — c…

ee7d2fc

…urrency checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(validation): implement EBA-UNIT-001, EBA-UNIT-002 — non-monetary…

eb16f19

… unit checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(validation): implement EBA-2.5, EBA-2.16.1, EBA-2.24, EBA-2.25 —…

4c98f96

… additional checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

antonio-olleros and others added 21 commits March 2, 2026 12:35

Updated taxonomy to 4.2.1

291a735

chore: release v2.0.0rc3 — validate-convert-validate pipeline and ta…

c273b75

…xonomy 4.2.1

chore: release v2.0.0rc4 — refactor dim-dom mapping to be inline

74fa352

feat(validation): enhance validation structure to support XBRL and EB…

9f6ce5d

…A sections

feat(release): update to version 2.0.0rc5 with new validation rules a…

3a19a50

…nd breaking changes

feat(validation): update CSV-033 to accept boolean values as 'true', …

9c0496b

…'false', '1', or '0', correct CSV-025 to ensure that only reported datapoints are considered.

feat(release): update to version 2.0.0rc6 with enhanced CSV validatio…

8d39f76

…ns and corrections

Fix converter to use final XBRL specification URLs instead of draft/C…

63eee76

…R-era values in reportPackage.json and report.json, so converted files now pass CSV-003 and CSV-011 validation.

feat(release): update to version 2.0.0rc8 with validation fixes and e…

7bf2bea

…nhancements

Merge branch 'main' of github.com:Meaningful-Data/xbridge into feat/a…

ea10bce

…dd-validations

feat(release): update to version 2.0.0 with validation enhancements a…

d9fe2bb

…nd documentation updates

feat(validation): update severity of CSV-049 rule to ERROR and adjust…

0352e8a

… related tests

feat(validation): update references to EBA Filing Rules v5.8 in docum…

61140ae

…entation and code comments

antonio-olleros requested a review from javihern98 March 18, 2026 10:32

github-advanced-security AI found potential problems Mar 18, 2026

View reviewed changes

Comment thread tests/test_eba_entity.py Fixed

antonio-olleros added 4 commits March 18, 2026 12:08

feat(validation): update Python version requirements to 3.10 in confi…

86ce65c

…guration files

feat(validation): update Python version requirements to 3.10 in poetr…

180be4a

…y.lock

feat(validation): update temporary file handling in tests to ensure c…

41ff252

…leanup. Ensures validity for Windows.

feat(validation): update test assertion for EBA-ENTITY-001 to check f…

304a479

…or specific error message URL

javihern98 approved these changes Mar 18, 2026

View reviewed changes

antonio-olleros merged commit 539bfbd into main Mar 18, 2026
16 checks passed

javihern98 deleted the feat/add-validations branch June 1, 2026 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/add validations#106

Feat/add validations#106
antonio-olleros merged 80 commits into
mainfrom
feat/add-validations

antonio-olleros commented Mar 18, 2026

Uh oh!

Uh oh!

javihern98 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

antonio-olleros commented Mar 18, 2026

Description

Type of Change

Related Issues

Changes Made

Testing

Tests Added

Testing Performed

Uh oh!

Uh oh!

javihern98 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants