Skip to content

Add experimental SQLite row-stream apply path#250

Draft
adamziel wants to merge 2 commits into
trunkfrom
codex/opt-row-stream-20260522010757
Draft

Add experimental SQLite row-stream apply path#250
adamziel wants to merge 2 commits into
trunkfrom
codex/opt-row-stream-20260522010757

Conversation

@adamziel

@adamziel adamziel commented May 22, 2026

Copy link
Copy Markdown
Collaborator

What it does

Adds an opt-in --experimental-sqlite-row-stream path for SQLite db-apply.

When enabled during db-pull, the importer writes .import-sqlite-row-stream.jsonl: a JSONL sidecar with one record per SQL statement. Producer-shaped INSERT statements are stored as structured table/column/typed row data; unsupported statements store byte ranges into db.sql so SQLite apply can fall back to the existing SQL path.

When enabled during SQLite db-apply, the importer consumes the sidecar and executes structured inserts through cached PDO prepared statements without reparsing the original SQL text. Fallback records still execute through the existing structured URL rewriting and MySQL-on-SQLite path.

Rationale

The current PHP.wasm SQLite apply profile spends most of its time rebuilding prepared insert templates from SQL text. The sidecar shifts that structural parse into a single db-pull pass and keeps db-apply on typed row records.

This keeps the correctness boundary explicit:

  • no domain detection from raw strpos shortcuts
  • no unstructured SQL guessing during apply
  • no decoded payload bytes embedded into SQL text
  • fallback to the existing SQL byte range path when the producer shape is unsupported

Implementation

Adds SQLiteRowStreamSidecar, which:

  • builds meta, insert, and fallback sql records
  • uses FastInsertScanner for producer-shaped rows
  • stores null, empty string, numeric, and base64 payload values as typed records
  • rebuilds stable SQLite prepared templates from the structured record shape
  • decodes and rewrites base64 payload values only at bind time

Importer integration:

  • db-pull --experimental-sqlite-row-stream writes the sidecar after db.sql completes
  • db-apply --experimental-sqlite-row-stream --target-engine=sqlite consumes that sidecar
  • resume state tracks row_stream_bytes_read alongside existing SQL byte progress, including partial row-stream resumes
  • sidecar metadata validates against the current db.sql byte length
  • benchmark runner accepts BENCH_DB_PULL_EXTRA_ARGS and BENCH_DB_APPLY_EXTRA_ARGS

Benchmarks were run under /tmp/reprint-bench.lock.

Stage origin/trunk row-stream branch Delta
playground-sqlite-db-apply 73.07 s 59.65 s -13.42 s (-18.4%)
playground-sqlite-db-pull 383.17 s 342.72 s -40.45 s (-10.6%)

Baseline commands, run on origin/trunk:

flock /tmp/reprint-bench.lock -c 'bash -lc '\''set -o pipefail; BENCH_STAGES=playground-sqlite-db-apply node tests/e2e/benchmark/bench-pull.mjs 2>&1 | tee .context/bench-trunk-playground-sqlite-db-apply.log'\'''
flock /tmp/reprint-bench.lock -c 'bash -lc '\''set -o pipefail; BENCH_STAGES=playground-sqlite-db-pull node tests/e2e/benchmark/bench-pull.mjs 2>&1 | tee .context/bench-trunk-playground-sqlite-db-pull.log'\'''

Branch commands:

flock /tmp/reprint-bench.lock -c 'bash -lc '\''set -o pipefail; BENCH_STAGES=playground-sqlite-db-apply BENCH_DB_PULL_EXTRA_ARGS=--experimental-sqlite-row-stream BENCH_DB_APPLY_EXTRA_ARGS=--experimental-sqlite-row-stream node tests/e2e/benchmark/bench-pull.mjs 2>&1 | tee .context/bench-branch-row-stream-playground-sqlite-db-apply.log'\'''
flock /tmp/reprint-bench.lock -c 'bash -lc '\''set -o pipefail; BENCH_STAGES=playground-sqlite-db-pull BENCH_DB_PULL_EXTRA_ARGS=--experimental-sqlite-row-stream node tests/e2e/benchmark/bench-pull.mjs 2>&1 | tee .context/bench-branch-row-stream-playground-sqlite-db-pull.log'\'''

Testing instructions

php -l packages/reprint-importer/src/lib/url-rewrite/class-sqlite-row-stream-sidecar.php
php -l packages/reprint-importer/src/import.php
php -l tests/Import/NewSiteUrlSqliteTest.php
./vendor/bin/phpunit tests/UrlRewriting/SQLiteRowStreamSidecarTest.php --colors=never
./vendor/bin/phpunit tests/Import/NewSiteUrlSqliteTest.php --filter ExperimentalRowStream --colors=never
./vendor/bin/phpunit tests/Import/NewSiteUrlSqliteTest.php --colors=never
./vendor/bin/phpunit tests/UrlRewriting/SQLiteRowStreamSidecarTest.php tests/UrlRewriting/SQLitePreparedInsertBuilderTest.php --colors=never
./vendor/bin/phpunit tests/UrlRewriting --colors=never

@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Pull pipeline performance — large-directory

Site: large-directory · 2,000+ plus targeted file-transfer scenarios files · 10,000 posts · 25,000 postmeta · PHP 8.5.6

Stage PR trunk Δ Status Details
playground-sqlite-db-pull 9.56 s 9.33 s ⚪ +236 ms (+2.5%) condition=db-pull in PHP.wasm
runtime=php.wasm 8.3
wp_mysql_parser=enabled
mode=lexer
native_lexer=verified
native_token_stream=WP_MySQL_Native_Token_Stream
native_token_count=18
native_parser=selected
trunk: condition=db-pull in PHP.wasm
runtime=php.wasm 8.3
wp_mysql_parser=enabled
mode=lexer
native_lexer=verified
native_token_stream=WP_MySQL_Native_Token_Stream
native_token_count=18
native_parser=selected
playground-sqlite-db-apply 3.53 s 3.61 s ⚪ -81 ms (-2.2%) condition=db-apply to SQLite in PHP.wasm
runtime=php.wasm 8.3
wp_mysql_parser=enabled
mode=parser
native_lexer=verified
native_token_stream=WP_MySQL_Native_Token_Stream
native_token_count=18
native_parser=verified
native_ast=WP_MySQL_Native_Parser_Node
sqlite_driver_parser=verified
trunk: condition=db-apply to SQLite in PHP.wasm
runtime=php.wasm 8.3
wp_mysql_parser=enabled
mode=parser
native_lexer=verified
native_token_stream=WP_MySQL_Native_Token_Stream
native_token_count=18
native_parser=verified
native_ast=WP_MySQL_Native_Parser_Node
sqlite_driver_parser=verified
Total 13.09 s 12.93 s ⚪ +155 ms (+1.2%)

Numbers carry runner noise; treat single-run deltas as directional, not authoritative.

📈 Trunk performance history — commit-by-commit timeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant