feat: Allow identifiers to begin with _#7243
Conversation
SIP-04x relaxes the `ClarityName` rules so that identifiers may begin with `_` (and the bare `_` is a valid identifier). The relaxation is wire-level unconditional but the language-level rule is gated at `ClarityVersion::Clarity6` via a new AST pass. * `clarity-types/src/representations.rs` — add a fourth alternation arm to `CLARITY_NAME_REGEX_STRING` for `^_([a-zA-Z0-9]|[-_!?+<>=/*])*$`. Tests in `clarity-types/src/tests/representations.rs` and the matching proptests in `clarity/src/vm/tests/representations.rs` are extended for the new arm and the bare-`_` case. * `clarity/src/vm/ast/parser/v2/lexer/mod.rs` — accept `_` as an identifier first-character. * `clarity/src/vm/ast/underscore_checker.rs` (new) — `BuildASTPass` that walks `pre_expressions` and rejects atoms / trait references / field identifiers beginning with `_` when `ClarityVersion < Clarity6`. Wired into `inner_build_ast` between the depth checks and `ExpressionIdentifier`. * `clarity/src/vm/ast/errors.rs` — new `UnderscoreIdentifierNotAllowed(String)` variant with a precise diagnostic. * `stackslib/src/chainstate/tests/parse_tests.rs` — exhaustive `variant_coverage_report` arm for the new variant, marked `Ignored` because it is covered by the clarity-side unit tests rather than by a consensus snapshot.
Per SIP-04x: the bare identifier `_` is a discard pattern in `let` and `match` binding positions. The bound expression is still evaluated (preserving the short-circuit effects of `try!`/`unwrap!`) but the name is not placed in scope, and repeated `_` bindings in the same form do not raise `NameAlreadyUsed`. Both the runtime and the type checker treat `_` as discard only when the contract's `ClarityVersion >= Clarity6`. Memory and cost accounting are preserved: a discard binding charges the same as a non-discard one, avoiding any gas-side-channel asymmetry. The value drops naturally at scope exit because it is never inserted into `inner_context.variables`. * `clarity/src/vm/functions/mod.rs` — `special_let` gates the conflict checks and the scope insertion on `!is_discard`. * `clarity/src/vm/functions/options.rs` — `eval_with_new_binding` (used by `match` opt/resp arms) gates the same way. * `clarity/src/vm/analysis/type_checker/v2_1/natives/mod.rs` — `check_special_let` skips `check_name_used`/`lookup_variable_type` and the `add_variable_type` for discards while still type-checking the bound expression. * `clarity/src/vm/analysis/type_checker/v2_1/natives/options.rs` — analyzer `eval_with_new_binding` gates similarly; new `use crate::vm::ClarityVersion`.
* `clarity/src/vm/tests/simple_apply_eval.rs` — five end-to-end tests:
- `test_let_discard_bare_underscore`: multiple `_` in one `let` form
coexist without `NameAlreadyUsed`.
- `test_let_discard_underscore_not_referenceable`: a `_` binding does
not place the name in scope; the body referring to `_` fails as
unbound.
- `test_let_underscore_prefix_is_regular_binding`: `_admin` (the
leading-`_` convention from the SIP example) binds normally and is
readable in the body — the leading `_` is convention only.
- `test_match_opt_discard_bare_underscore` / `_resp_…`: `_` discard
semantics extend to `match` on `(some …)` and `(ok …)/(err …)`.
* `changelog.d/underscore-identifiers.added` — release note.
The dedicated `^_(...)$` alternation arm added in bca48790fe duplicated the body character class from the letter arm. Since the body class `[-_!?+<>=/*]` already accepts `_`, the cleaner factoring is to widen the leading character class from `[a-zA-Z]` to `[a-zA-Z_]` and drop the second arm entirely. The bare-`_` case still works because the body quantifier is `*` (zero or more). Old: ^[a-zA-Z]([a-zA-Z0-9]|[-_!?+<>=/*])*$|^_([a-zA-Z0-9]|[-_!?+<>=/*])*$|^[-+=/*]$|^[<>]=?$ New: ^[a-zA-Z_]([a-zA-Z0-9]|[-_!?+<>=/*])*$|^[-+=/*]$|^[<>]=?$ The matched language is identical — purely a regex simplification. Tests in `clarity-types/src/tests/representations.rs` are unchanged; the proptest strategies in `clarity/src/vm/tests/representations.rs` collapse their two identifier branches into one and the `assert_regex_unchanged` literals are updated.
While auditing coverage on `underscore_checker.rs` I noticed two gaps:
1. The lexer dispatch for `<` only enters `read_trait_identifier` when
the next char `is_ascii_alphabetic()`. So `<_foo>` was being lexed as
three tokens (`Less`, the atom `_foo`, `Greater`) rather than as one
`Token::TraitIdent("_foo")`. The pre-Clarity-6 rejection test for
`<_foo>` was passing for the wrong reason — the underscore checker
was catching the `_foo` `Atom` branch, never the `TraitReference`
branch. Fix: also enter the trait-identifier reader when `<` is
followed by `_`, matching the relaxation done earlier in the main
identifier dispatch.
2. Missing coverage in `underscore_checker::tests`:
- The `Tuple` recursion arm of `check_one` was unexercised. New tests
`underscore_in_tuple_key_{rejected,accepted}` cover it.
- The `SugaredFieldIdentifier` and `FieldIdentifier` arms (the
`.contract.trait` and `'<addr>.contract.trait` desugarings) were
unexercised. New rejection tests cover both.
- Symmetric Clarity-6 acceptance tests for `_x` in let bindings,
`_addr` function arguments, and `<_foo>` trait references were
missing — added.
- No test verified the *specific* `ParseErrorKind` emitted. Added
`rejection_emits_underscore_identifier_not_allowed_kind`, which
uses `build_ast` (with `error_early=true`) to assert
`ParseErrorKind::UnderscoreIdentifierNotAllowed("_admin")`.
- Added `leading_operator_names_unaffected_pre_clarity6` as a
regression guard: the pass must leave `+`, `<=`, `*` etc. alone.
Test count goes from 10 to 19.
| // 3) `[<>]=?` — comparison operator name. | ||
| pub static ref CLARITY_NAME_REGEX_STRING: String = | ||
| "^[a-zA-Z]([a-zA-Z0-9]|[-_!?+<>=/*])*$|^[-+=/*]$|^[<>]=?$".into(); | ||
| "^[a-zA-Z_]([a-zA-Z0-9]|[-_!?+<>=/*])*$|^[-+=/*]$|^[<>]=?$".into(); |
There was a problem hiding this comment.
I'm concerned about changing this without a gate at this level. I think this could cause an accidental hard fork due to its use in wire-level validation. Like could a transaction with a leading _ somewhere get into a block that is accepted by updated nodes and rejected by unupdated nodes, before the epoch transition?
There was a problem hiding this comment.
You've got a point here, I can try to tighten this up and make better use the type system to "make invalid states unrepresentable"
There was a problem hiding this comment.
I'm planning to fix by leaving the current ClarityName alone and creating a new struct, ClarityNameV6, which allows the leading underscore. This should be the safest and most correct way to proceed, as we'll never be able to use a leading _ in the legacy code paths, but will also be a large diff
@brice-stacks Let me know if this sounds right to you
There was a problem hiding this comment.
Yeah I think that sounds good. Thanks.
There was a problem hiding this comment.
I ended up not doing this as the diff would have been huge. Added admission checks in StacksBlock::validate_transaction_static_epoch() instead
There was a problem hiding this comment.
I think there's still a potential divergence with runtime values. For example, a from-consensus-buff? where the buffer encodes a tuple with a key starting with an _, e.g.
(from-consensus-buff? {a: int} 0x0c00000002016100000000000000000000000000000001c8025f78000000000000000000000000000000007b)There was a problem hiding this comment.
Fixed in f1691e6 by adding check in from_consensus_buff()
Coverage Report for CI Build 26543228798Coverage decreased (-38.4%) to 47.577%Details
Uncovered Changes
Coverage Regressions88607 previously-covered lines in 344 files lost coverage.
Coverage Stats
💛 - Coveralls |
Introduces the parallel name type that Clarity 6 uses for wide
identifiers (leading `_` admitted, including the bare `_`). The
companion legacy `ClarityName` keeps its current (still-widened)
regex in this commit so the AST and analyzer continue to compile;
the narrowing of `ClarityName` back to its pre-PR form is deferred
to a follow-up commit after the AST has been migrated to consume
`ClarityNameV6` for source-level identifiers.
* `clarity-types/src/representations.rs`
- `CLARITY_NAME_V6_REGEX_STRING` / `CLARITY_NAME_V6_REGEX`: the
Clarity-6 regex (currently identical in shape to the widened
`CLARITY_NAME_REGEX_STRING`; the legacy regex will narrow in a
later commit and diverge from this one).
- `ClarityNameV6` defined via `guarded_string!`, with
`StacksMessageCodec` impl mirroring `ClarityName`.
- `impl From<ClarityName> for ClarityNameV6` (infallible
widening) and `impl TryFrom<ClarityNameV6> for ClarityName`
(fallible narrowing) for boundary conversions.
* `clarity-types/src/types/serialization.rs`
- `serialize_guarded_string!(ClarityNameV6)` so the new type is
Value-codec-ready for the forthcoming Value::TupleV6 variant.
* `clarity-types/src/tests/representations.rs`
- ClarityNameV6 validity / invalidity / codec round-trip cases.
- Widening (ClarityName -> ClarityNameV6) and narrowing
(ClarityNameV6 -> ClarityName) conversion tests, including a
`_currently_transitional` case that documents the contract for
the upcoming narrowing commit.
No production code path consumes ClarityNameV6 yet; this is the
foundation for the type-system safety property that the next set of
commits will activate.
Under the option-2 design (chosen after discussing the trade-off
with the AST migration cost), the *narrow* type is the new
addition rather than the wide one. `ClarityName` stays at its
PR-widened semantics — used by the AST, analyzer, runtime, and
Clarity-6 wire positions — and a new `LegacyClarityName` carries
the pre-Clarity-6 (narrow) regex for the wire-narrow positions
that must statically reject leading-`_` names.
This commit supersedes the foundation introduced in the previous
commit; the V6-named type and its inverted conversions are
removed in favor of the legacy-named type.
* `clarity-types/src/representations.rs`
- `LEGACY_CLARITY_NAME_REGEX_STRING` / `LEGACY_CLARITY_NAME_REGEX`:
the pre-PR narrow regex (no leading `_`).
- `LegacyClarityName` via `guarded_string!`, with
`StacksMessageCodec` impl. The codec's `try_from`-via-narrow-regex
is exactly where the consensus property gets enforced: bytes
encoding a `_`-prefixed name produce a `DeserializeError` —
matching what un-modified (pre-PR) nodes do.
- `From<LegacyClarityName> for ClarityName` (infallible widening)
and `TryFrom<ClarityName> for LegacyClarityName` (fallible
narrowing) for boundary use.
* `clarity-types/src/types/serialization.rs`
- `serialize_guarded_string!(LegacyClarityName)` so the narrow
type is ready for use as a `TupleData` key once that field
type is migrated in a follow-up commit.
* `clarity-types/src/tests/representations.rs`
- Validity / invalidity / codec round-trip cases for
`LegacyClarityName`.
- `test_legacy_clarity_name_rejects_leading_underscore`: the
type-system contract.
- `test_legacy_clarity_name_rejects_underscore_on_the_wire`: the
consensus contract. If this ever flips to passing, the
chain-split risk that motivated this refactor has silently
reappeared.
- Widening / narrowing conversion tests, including the
`test_wide_to_legacy_underscore_rejected` cases that lock in
the fallible-narrowing direction.
No production code path consumes `LegacyClarityName` yet; follow-up
commits migrate the wire-narrow positions (function_name,
asset_name, tuple keys) one at a time.
…ityName
`TransactionContractCall.function_name` is the first wire-narrow
position to receive the type-system safety property: the field
type is now `LegacyClarityName` (narrow regex), so any attempt to
construct or deserialize a contract-call transaction with a
`_`-prefixed function name is rejected by the constructor /
codec rather than the runtime.
The wire codec error wording is preserved verbatim so existing
diagnostics — and the `tx_stacks_transaction_payload_invalid_function_name`
test that keys off them — remain stable.
Type-level change:
* `stackslib/src/chainstate/stacks/mod.rs` — field type flipped,
with a doc comment noting that Clarity-6 `_`-prefixed function
calls would need a versioned payload sibling (analogous to
`VersionedSmartContract`), which is not yet introduced.
Codec / construction sites updated to the new type:
* `stackslib/src/chainstate/stacks/transaction.rs` —
`consensus_deserialize` and the `try_from_parts` constructor.
* `stackslib/src/chainstate/nakamoto/signer_set.rs` — the
`SIGNERS_VOTING_FUNCTION_NAME` literal comparison.
* `stackslib/src/util_lib/strings.rs` — new
`impl From<LegacyClarityName> for StacksString` mirroring the
existing `ClarityName` impl.
* `stackslib/src/core/test_util.rs` — `make_contract_call_tx` and
`make_contract_call_mblock_only` helpers.
* `contrib/stacks-cli/src/main.rs` — the CLI's
`make_contract_call_payload`.
Test-only construction sites mechanically updated via the same
sed pattern (each carries the new `LegacyClarityName` import and
keeps everything else unchanged):
* `stackslib/src/chainstate/coordinator/tests.rs`
* `stackslib/src/chainstate/nakamoto/tests/mod.rs`
* `stackslib/src/chainstate/stacks/transaction.rs` (tests)
* `stackslib/src/chainstate/stacks/mod.rs` (tests)
* `stackslib/src/clarity_vm/clarity.rs` (tests)
* `stackslib/src/clarity_vm/tests/ephemeral.rs`
* `stackslib/src/cost_estimates/tests/{cost_estimators,fee_medians,fee_scalar}.rs`
* `stackslib/src/net/api/tests/{blockreplay,blocksimulate}.rs`
* `stacks-node/src/event_dispatcher/tests.rs`
* `stacks-node/src/tests/{integrations,neon_integrations}.rs`
`cargo test -p clarity-types --lib` (256 tests) and
`cargo test -p stackslib --lib chainstate::stacks::transaction`
(122 tests) pass; full workspace `cargo check --all-targets`
clean with no warnings.
`AssetInfo` (used in `TransactionPostCondition::Fungible` and
`Nonfungible` post-conditions) is the second wire-narrow position
to receive the safety property: `asset_name` is now
`LegacyClarityName`, so any attempt to construct or deserialize a
post-condition that references a `_`-prefixed asset is rejected by
the constructor / codec rather than the runtime.
The two boundary points where the wire-narrow `AssetInfo.asset_name`
flows into the runtime `AssetIdentifier.asset_name` (which remains
a wide `ClarityName`, since `AssetIdentifier` lives entirely in the
VM and never crosses the wire) get an explicit `.into()` widening
conversion — `From<LegacyClarityName> for ClarityName`. The
conversion is infallible, so no error path is added.
Production code:
* `stackslib/src/chainstate/stacks/mod.rs` — `AssetInfo.asset_name`
field type flipped to `LegacyClarityName`, with a doc comment.
* `stackslib/src/chainstate/stacks/transaction.rs` —
`AssetInfo::consensus_deserialize` now reads a
`LegacyClarityName`.
* `stackslib/src/chainstate/stacks/db/transactions.rs` — the two
post-condition processing branches widen the asset name when
building `AssetIdentifier`.
Test-only construction sites mechanically updated:
* `stackslib/src/chainstate/stacks/{block,mod,transaction}.rs` —
rename `asset_name: ClarityName::*` to
`asset_name: LegacyClarityName::*`, including `let`-bindings
that feed `AssetInfo` literal struct construction.
* `stackslib/src/chainstate/stacks/db/transactions.rs` — test
sites that construct `AssetIdentifier` from `AssetInfo` add
`.into()` widening.
* `stacks-node/src/tests/nakamoto_integrations.rs` — one
construction site renamed plus `LegacyClarityName` import.
The `cargo fix` pass dropped now-unused `ClarityName` imports in
the affected files.
`cargo test -p clarity-types --lib` (256 tests) and
`cargo test -p stackslib --lib chainstate::stacks::transaction`
(122 tests) pass; full workspace `cargo check --all-targets`
clean with no warnings.
Closes the third (and largest) wire-narrow position in the
LegacyClarityName refactor: tuple keys carried inside `Value`
payloads on a transaction. Because `Value::Tuple` keys live behind
the wide `TupleData.data_map: BTreeMap<ClarityName, Value>`
(intentionally — `TupleData` is shared by the source-level VM and
must admit Clarity-6 `_foo` keys), we can't enforce the rule
statically at the codec. Instead, `StacksBlock::validate_transaction_static_epoch`
walks the embedded `Value`s and rejects offending keys.
The rule, matching the source-level AST pass:
* bare `_` is reserved as the `let` / `match` discard marker
and is rejected at every epoch
* any other leading-`_` key is rejected when
`epoch_id < StacksEpochId::Epoch40` (pre-Clarity-6). The narrow
wire codec on un-upgraded nodes already rejects every
leading-`_` name, so matching that behavior on upgraded nodes
during the upgrade window preserves consensus. Post-activation,
`_foo` tuple keys are permitted
* `clarity-types/src/types/mod.rs`
- `Value::find_invalid_tuple_key(epoch)` — recursive walker that
descends into `Tuple`, `Sequence(List)`, `Optional::Some`, and
`Response`, returning the first key that would be rejected at
`epoch`. Leaf and tuple-free variants short-circuit to `None`.
- Private `tuple_key_invalid_for_epoch` predicate that owns the
two-pronged rule.
* `clarity-types/src/tests/types/mod.rs`
- 14 walker cases: bare `_` rejected at every epoch, leading-`_`
rejected pre-Clarity-6 but admitted in Clarity-6, plain keys
admitted at every epoch, leaf variants short-circuit, and the
walker descends into `(some ...)`, `(ok ...)`, `(err ...)`,
`(list ...)`, and nested tuples.
* `stackslib/src/chainstate/stacks/block.rs`
- `validate_transaction_static_epoch` now invokes the walker on
every `ContractCall.function_args` element and on every
`Nonfungible` post-condition's `asset_value` — the two
positions where a transaction can embed a Value.
- 9 admission cases covering both ContractCall args and NFT
post-condition payloads, across pre- and post-Clarity-6 epochs.
`cargo test -p clarity-types --lib` (270 tests) and
`cargo test -p stackslib --lib chainstate::stacks::block`
(58 tests) pass; full workspace `cargo check --all-targets` clean.
Earlier commits in this branch introduced `LegacyClarityName` as a narrow refinement type at `TransactionContractCall.function_name` and `AssetInfo.asset_name`, with the wide `ClarityName` everywhere else. That hybrid (codec-level enforcement for scalars, admission-walker for tuple keys) was consensus-safe but left a permanent language-level irregularity: a Clarity-6 contract can define `(define-public (_foo) ...)` or `(define-fungible-token _coin ...)` but those names can never be referenced from external transactions, because their wire-narrow codec rejects `_foo`. The restriction would persist forever, with no language-level rationale — it's a wire-format implementation detail masquerading as a language rule. This commit unifies the design on the admission-walker pattern, which is how the Stacks codebase already gates every other epoch-conditional wire rule (SIP-040 post-conditions, originator mode, NFT MaybeSent, Nakamoto coinbase VRF, TenureChange, ...). Both upgraded and un-upgraded nodes reach the same verdict on every transaction during the upgrade window, so consensus is preserved; after Clarity-6 activates, the admission gate flips to accept and `_foo` becomes callable. Changes: * `clarity-types/src/representations.rs` — drop `LegacyClarityName` type, `LEGACY_CLARITY_NAME_REGEX_STRING`, `LEGACY_CLARITY_NAME_REGEX`, the `From<LegacyClarityName> for ClarityName` widening, the `TryFrom<ClarityName> for LegacyClarityName` narrowing, and the `StacksMessageCodec` impl. * `clarity-types/src/types/serialization.rs` — drop `serialize_guarded_string!(LegacyClarityName)`. * `clarity-types/src/tests/representations.rs` — drop the `LegacyClarityName` test suite (validity, codec round-trip, conversions). The remaining tests cover the wide `ClarityName` which now is the only name type. * `clarity/src/vm/representations.rs` — drop the `LegacyClarityName` / regex re-exports. * `stacks-codec/src/strings.rs` — drop `impl From<LegacyClarityName> for StacksString`. * `stacks-codec/src/transaction.rs` — `TransactionContractCall.function_name` and `AssetInfo.asset_name` revert to `ClarityName`. `new_contract_call` constructor uses `ClarityName::try_from` again. * `stackslib/src/chainstate/stacks/block.rs::validate_transaction_static_epoch` — extended to reject leading-`_` `function_name` and `asset_name` (Fungible + Nonfungible post-conditions) when `epoch_id < StacksEpochId::Epoch40`. Tuple-key check unchanged. Five new admission tests cover the scalar paths (`rejects_leading_underscore_function_name_pre_clarity6`, `rejects_leading_underscore_asset_name_pre_clarity6`, `admits_leading_underscore_scalars_in_clarity6`, `admits_plain_scalar_names`) plus existing tuple-key tests. * `stackslib/src/chainstate/stacks/db/transactions.rs` — the runtime widening `.into()` calls at `AssetIdentifier::asset_name` are now no-ops (both sides are `ClarityName`); dropped along with their inline `// Widen LegacyClarityName -> ClarityName` comments. * `stackslib/src/chainstate/stacks/transaction.rs` — codec round-trip tests for leading-`_` `function_name` and `asset_name` flip semantics: the codec now *accepts* the bytes (deserialization succeeds), and the admission tests in `block.rs` lock the consensus rule. * All previously-migrated test construction sites (~20 files across stackslib, stacks-node, contrib, ...) revert to `ClarityName::try_from` / `ClarityName::from_literal`. * `clarity/src/vm/ast/underscore_checker.rs` — module doc refreshed to point at the admission walker (no longer at `LegacyClarityName`). Net result: one wide `ClarityName` type, no language-level restrictions on leading `_`, single uniform epoch-gated admission pattern. The PR's source-level relaxation can stand on its own without any permanent wire-format or type-system irregularities. `cargo check --workspace --all-targets` clean; targeted tests all pass (clarity-types: 236; transaction.rs: 124; block.rs: 40).
Description
As of Clarity 6, allow all Clarity identifiers to start with
_, and allow bare_as discard identifier inlet/matchbindingsThis is intended to be used in combination with the Clarinet linter to allow users to signal that an identifier is intentionally unused
Applicable issues
_#7097Additional info (benefits, drawbacks, caveats)
Important
A few choices I made during this PR that I want to point out to reviewers:
_is allowed for all identifiers, even where it isn't helpful for linting (like public function names)_identifier is only allowed inletandmatchbindings. Using it elsewhere will generate an error on contract analysis.let/matchidentifiers, but use different codepathsThe SIP has also been updated to reflect these
Checklist
docs/property-testing.md)changelog.d/README.md)rpc/openapi.yamlfor RPC endpoints,event-dispatcher.mdfor new events)clarity-benchmarkingrepo