Skip to content

chore(deps): consolidate NUL-strip onto wxyc_etl::pg::to_pg_text_form#51

Merged
jakebromberg merged 1 commit into
mainfrom
chore/wxyc-etl-0.8-sweep
Jun 9, 2026
Merged

chore(deps): consolidate NUL-strip onto wxyc_etl::pg::to_pg_text_form#51
jakebromberg merged 1 commit into
mainfrom
chore/wxyc-etl-0.8-sweep

Conversation

@jakebromberg

Copy link
Copy Markdown
Member

Summary

Step 2 of the WX-3.B follow-up consumer sweep (WXYC/wxyc-etl#142). Delegates this repo's NUL-strip arm to wxyc_etl::pg::to_pg_text_form so the WXYC/docs#18 PG TEXT U+0000 strip policy lives in one place across the org.

What changed

  • src/import.rs:58escape_copy_text is preserved (it does more than NUL strip — it also escapes \\, \\t, \\n, \\r for the PG COPY TEXT format), but the NUL-strip arm is now a first pipeline step that calls the upstream helper. Existing single-pass becomes two-pass under delegation; common case (no NULs) stays zero-copy via the upstream Cow::Borrowed short-circuit. The five escape_copy_text_* tests at L135-160 assert composed behavior and pass unchanged.
  • src/wxyc_loader.rs:190 — local strip_pg_null_bytes + strip_pg_null_bytes_opt and their two unit tests are gone. The six call-sites use to_pg_text_form(...).into_owned() directly.
  • Cargo.tomlwxyc-etl = \"0.6.0\"\"0.8\".

Scope note — second helper not in the issue audit

The original #142 table for this repo called out only src/import.rs:58 (escape_copy_text). After cloning, I found a second NUL-strip helper at src/wxyc_loader.rs:190 (strip_pg_null_bytes) added later during the cross-cache-identity rollout (E1 §4.1.3) and missed by the issue's audit. Both implement the same WXYC/docs#18 policy. Leaving the second behind would defeat the consolidation goal ("policy lives in one place"), so this PR sweeps both. If the reviewer prefers to split the wxyc_loader.rs portion into a follow-up, that's straightforward.

Performance note (already in the issue)

The two-pass behavior added by delegating escape_copy_text's NUL strip:

  • No-NUL inputs: zero-copy via upstream Cow::Borrowed — same as before.
  • NUL-bearing inputs: one extra allocation. Rare in real Wikidata data per the issue.

CI dependency

This PR bumps to wxyc-etl = \"0.8\". CI will fail until WXYC/wxyc-etl#143 merges and a v0.8.0 tag publishes to crates.io. The Rust crate API at wxyc_etl::pg::to_pg_text_form is unchanged from 0.7.0, so this PR is functionally complete today — it's only the version constraint that needs the publish.

Test plan

  • cargo test --lib — 36 tests pass (locally verified with a [patch.crates-io] pointer at the 0.8.0 worktree; patch reverted before commit)
  • cargo clippy --all-targets -- -D warnings clean
  • cargo fmt --check clean
  • PG import integration tests require TEST_DATABASE_URL; will run in CI
  • CI green after WXYC/wxyc-etl#143 publishes

Towards WXYC/wxyc-etl#142.

WX-3.B follow-up sweep (WXYC/wxyc-etl#142). The local NUL-strip policy in two places — `src/import.rs::escape_copy_text` and `src/wxyc_loader.rs::strip_pg_null_bytes` — both implement the WXYC/docs#18 PG TEXT U+0000 strip. Delegate the NUL-strip step to `wxyc_etl::pg::to_pg_text_form` so the policy lives in one place.

In `src/import.rs`, `escape_copy_text` keeps the PG COPY TEXT escaping (backslash, tab, newline, carriage return are reserved in-band metacharacters) but the NUL-strip arm is now a first pipeline step that calls the upstream helper. Existing single-pass implementation becomes two-pass under delegation; the common case (no NULs) stays zero-copy thanks to the upstream `Cow::Borrowed` short-circuit, only NUL-bearing inputs pay an extra allocation, which is rare in real Wikidata data. The five `escape_copy_text_*` tests at L135-160 assert composed behavior (not the helper) and pass unchanged.

In `src/wxyc_loader.rs`, drop `strip_pg_null_bytes` + `strip_pg_null_bytes_opt` and their two unit tests. The six call-sites in `populate_wxyc_library_v2` use the upstream helper directly via `to_pg_text_form(&s).into_owned()` to land owned `String` values for the INSERT params. The `Option<&str>` sites use `Option::map(|s| to_pg_text_form(s).into_owned())`, which mirrors the new Python contract: `None` passes through.

Bumps the wxyc-etl pin to `"0.8"`. The Rust crate API at `wxyc_etl::pg::to_pg_text_form` is unchanged from 0.7.0; the 0.8.0 bump is the upstream's Python contract fix (`Option<&str> -> Option<String>` replacing the `None -> ""` coercion). This Rust consumer is unaffected by the contract change but moves with the org-wide cycle.

Scope note: the original WXYC/wxyc-etl#142 table called out only `src/import.rs:58` (escape_copy_text). The `src/wxyc_loader.rs:190` helper was added later during the cross-cache-identity rollout (E1 §4.1.3) and was not in the issue's audit. Both implement the same WXYC/docs#18 policy and are swept together here so the consolidation goal — "policy lives in one place" — is actually achieved for this repo.

Test plan: `cargo test --lib` (36 tests pass, 2 strip_pg_null_bytes tests deleted, 5 escape_copy_text_* tests still pass unchanged — contract delegation is invisible at the boundary). `cargo clippy --all-targets -- -D warnings` clean. `cargo fmt --check` clean. PG import integration tests (`tests/import_*` requiring `TEST_DATABASE_URL`) not run locally; will run in CI.
@jakebromberg jakebromberg merged commit f4b89cb into main Jun 9, 2026
3 of 6 checks passed
@jakebromberg jakebromberg deleted the chore/wxyc-etl-0.8-sweep branch June 9, 2026 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant