Skip to content

Added numeric digits over the HR codes#720

Open
javihern98 wants to merge 1 commit into
developfrom
fix/numeric-codes-codeItem
Open

Added numeric digits over the HR codes#720
javihern98 wants to merge 1 commit into
developfrom
fix/numeric-codes-codeItem

Conversation

@javihern98

@javihern98 javihern98 commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

VTL 2.1 accepted Code Item values starting with a digit (e.g. 1AA) in hierarchical rulesets, because its IDENTIFIER token allowed digit-leading names. In 2.2 IDENTIFIER was narrowed to letter/underscore-leading (for SDMX ids, e0ae21e), which also dropped these codes. This restores the 2.1 behaviour for hierarchical-ruleset code items only, without re-widening IDENTIFIER globally.

Example

Now parses under 2.2 (previously failed — 1AA lexed as 1 + AA):

define hierarchical ruleset TEST_HR (valuedomain rule TEST_VD) is
    TEST_1: 1AA = 1AB + 1AC
end hierarchical ruleset;

Changes

  • VtlTokens.g4: new token ITEM_CODE : CODE_PART ; (CODE_PART = [A-Za-z0-9_]+), placed after IDENTIFIER so it only wins for digit-leading lexemes nothing else can match.
  • Vtl.g4: valueDomainValue now accepts IDENTIFIER | ITEM_CODE | signedInteger | signedNumber.

@vpinna80

vpinna80 commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Thank you for catching this, hovever not only codes but every VTL identifier can start with a number. I would keep the parser grammar the same, but change only the IDENTIFIER rule from:

IDENTIFIER
  : ID_PART 
  | ID_PART COLON ID_PART ( LPAREN SDMX_VERSION RPAREN )? ( COLON (DOT | CODE_PART)+ )?
  | '\'' ( '\\\'' | ~'\'' )* '\''
;

to

IDENTIFIER
  : ([0-9] [a-zA-Z0-9_.]*)? [a-zA-Z] [a-zA-Z0-9_.]*
  | ID_PART COLON ID_PART ( LPAREN SDMX_VERSION RPAREN )? ( COLON (DOT | CODE_PART)+ )?
  | '\'' ( '\\\'' | ~'\'' )* '\''
;

@NicoLaval

Copy link
Copy Markdown
Collaborator

Good point @javihern98 , @vpinna80 , I agree, it's good in terms of IDENTIFIER

@javihern98

Copy link
Copy Markdown
Collaborator Author

After testing seems the proper fix would look like this (the proposal from @vpinna80 had missing the underscore in the second group):

IDENTIFIER
  : ([0-9] [a-zA-Z0-9_.]*)? [a-zA-Z_] [a-zA-Z0-9_.]*
  | ID_PART COLON ID_PART ( LPAREN SDMX_VERSION RPAREN )? ( COLON (DOT | CODE_PART)+ )?
  | '\'' ( '\\\'' | ~'\'' )* '\''
;

@vpinna80 @NicoLaval Would you agree? I will make the changes in this branch with this

@vpinna80

vpinna80 commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

That is not a mistake, unfortunately:

The regular names:

  • can contain alphabetic and numeric characters and the special characters underscore (_) and dot (.) ,
  • must begin with an alphanumeric character and not with a special character
  • must contain at least one alphabetic character
  • cannot be a VTL reserved word

This means that neither _1, 1_, or _1A are regular names, so you must use quotes for these.
Perhaps we should remove the 2nd condition from the user manual?
And perhaps document the SDMX names, even though I'm not so sure about directly mentioning SDMX in the user manual.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants