chore(deps): consolidate NUL-strip onto wxyc_etl::pg::to_pg_text_form#51
Merged
Conversation
WX-3.B follow-up sweep (WXYC/wxyc-etl#142). The local NUL-strip policy in two places — `src/import.rs::escape_copy_text` and `src/wxyc_loader.rs::strip_pg_null_bytes` — both implement the WXYC/docs#18 PG TEXT U+0000 strip. Delegate the NUL-strip step to `wxyc_etl::pg::to_pg_text_form` so the policy lives in one place. In `src/import.rs`, `escape_copy_text` keeps the PG COPY TEXT escaping (backslash, tab, newline, carriage return are reserved in-band metacharacters) but the NUL-strip arm is now a first pipeline step that calls the upstream helper. Existing single-pass implementation becomes two-pass under delegation; the common case (no NULs) stays zero-copy thanks to the upstream `Cow::Borrowed` short-circuit, only NUL-bearing inputs pay an extra allocation, which is rare in real Wikidata data. The five `escape_copy_text_*` tests at L135-160 assert composed behavior (not the helper) and pass unchanged. In `src/wxyc_loader.rs`, drop `strip_pg_null_bytes` + `strip_pg_null_bytes_opt` and their two unit tests. The six call-sites in `populate_wxyc_library_v2` use the upstream helper directly via `to_pg_text_form(&s).into_owned()` to land owned `String` values for the INSERT params. The `Option<&str>` sites use `Option::map(|s| to_pg_text_form(s).into_owned())`, which mirrors the new Python contract: `None` passes through. Bumps the wxyc-etl pin to `"0.8"`. The Rust crate API at `wxyc_etl::pg::to_pg_text_form` is unchanged from 0.7.0; the 0.8.0 bump is the upstream's Python contract fix (`Option<&str> -> Option<String>` replacing the `None -> ""` coercion). This Rust consumer is unaffected by the contract change but moves with the org-wide cycle. Scope note: the original WXYC/wxyc-etl#142 table called out only `src/import.rs:58` (escape_copy_text). The `src/wxyc_loader.rs:190` helper was added later during the cross-cache-identity rollout (E1 §4.1.3) and was not in the issue's audit. Both implement the same WXYC/docs#18 policy and are swept together here so the consolidation goal — "policy lives in one place" — is actually achieved for this repo. Test plan: `cargo test --lib` (36 tests pass, 2 strip_pg_null_bytes tests deleted, 5 escape_copy_text_* tests still pass unchanged — contract delegation is invisible at the boundary). `cargo clippy --all-targets -- -D warnings` clean. `cargo fmt --check` clean. PG import integration tests (`tests/import_*` requiring `TEST_DATABASE_URL`) not run locally; will run in CI.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Step 2 of the WX-3.B follow-up consumer sweep (WXYC/wxyc-etl#142). Delegates this repo's NUL-strip arm to
wxyc_etl::pg::to_pg_text_formso the WXYC/docs#18 PG TEXT U+0000 strip policy lives in one place across the org.What changed
src/import.rs:58—escape_copy_textis preserved (it does more than NUL strip — it also escapes\\,\\t,\\n,\\rfor the PG COPY TEXT format), but the NUL-strip arm is now a first pipeline step that calls the upstream helper. Existing single-pass becomes two-pass under delegation; common case (no NULs) stays zero-copy via the upstreamCow::Borrowedshort-circuit. The fiveescape_copy_text_*tests at L135-160 assert composed behavior and pass unchanged.src/wxyc_loader.rs:190— localstrip_pg_null_bytes+strip_pg_null_bytes_optand their two unit tests are gone. The six call-sites useto_pg_text_form(...).into_owned()directly.Cargo.toml—wxyc-etl = \"0.6.0\"→\"0.8\".Scope note — second helper not in the issue audit
The original #142 table for this repo called out only
src/import.rs:58(escape_copy_text). After cloning, I found a second NUL-strip helper atsrc/wxyc_loader.rs:190(strip_pg_null_bytes) added later during the cross-cache-identity rollout (E1 §4.1.3) and missed by the issue's audit. Both implement the same WXYC/docs#18 policy. Leaving the second behind would defeat the consolidation goal ("policy lives in one place"), so this PR sweeps both. If the reviewer prefers to split thewxyc_loader.rsportion into a follow-up, that's straightforward.Performance note (already in the issue)
The two-pass behavior added by delegating
escape_copy_text's NUL strip:Cow::Borrowed— same as before.CI dependency
This PR bumps to
wxyc-etl = \"0.8\". CI will fail until WXYC/wxyc-etl#143 merges and av0.8.0tag publishes to crates.io. The Rust crate API atwxyc_etl::pg::to_pg_text_formis unchanged from 0.7.0, so this PR is functionally complete today — it's only the version constraint that needs the publish.Test plan
cargo test --lib— 36 tests pass (locally verified with a[patch.crates-io]pointer at the 0.8.0 worktree; patch reverted before commit)cargo clippy --all-targets -- -D warningscleancargo fmt --checkcleanTEST_DATABASE_URL; will run in CITowards WXYC/wxyc-etl#142.