Speed up symlink following with optional concurrency#184
Draft
adamziel wants to merge 17 commits into
Draft
Conversation
Contributor
Pull pipeline performance —
|
| Stage | PR | trunk | Δ | Status | Details |
|---|---|---|---|---|---|
files-pull-large-part-peak-memory |
2.80 s | 2.67 s | ⚪ +123 ms (+4.6%) | ✓ | condition=files-pull large multipart file part file=24.0 MiB chunk_size=16.0 MiB peak_memory=10.0 MiB trunk: condition=files-pull large multipart file part file=24.0 MiB chunk_size=16.0 MiB peak_memory=10.0 MiB |
| Total | 2.80 s | 2.67 s | ⚪ +123 ms (+4.6%) |
Numbers carry runner noise; treat single-run deltas as directional, not authoritative.
📈 Trunk performance history — commit-by-commit timeline.
When a site has many symlinked directories — a common pattern on wp.com Atomic where plugins, themes, and mu-plugins each live under a different shared root — the file-index phase must fetch a remote directory listing for every target directory it discovers. Doing these one at a time means the total wait is the sum of all those network round-trips. This adds --symlink-follow-concurrency=N to process up to N directories simultaneously. The default stays at 1 (identical to the previous sequential path) to avoid any regression on normal sites. Pass --no-adaptive and the default rises to 5 automatically, since users opting out of adaptive pacing are typically trying to maximise throughput. Pass an explicit value to override either default; the range is clamped to [1, 32]. The new concurrent code path uses a rolling-window slot model: up to N directory slots are active at once, each advancing page by page through its own cursor. A committed watermark ensures the remote index file is only appended to in slot order — if slot N stalls, slots ahead of it buffer their data in memory until slot N finishes, at which point everything drains in one ordered write. On SIGTERM the in-flight slot state (directory, cursor, attempt count) is saved and restored on resume so no already-fetched data is discarded. The sequential path (concurrency == 1) is left byte-for-byte unchanged; the new code only activates when N > 1.
The pull command validates --filter after handling --symlink-follow-concurrency. By eagerly persisting the resolved concurrency to disk on every run, the state file got created even when an invalid --filter caused the command to throw moments later. Tests covering pre-validation invariants (state must not exist after a rejected option) regressed. Now the state is only written when the user actually passes the option, matching the pattern used by the other persisted options. Resolution into the in-memory property still happens on every run so concurrent symlink following stays available to the rest of the request. Also tightens local types in the new concurrent indexer so PHPStan can follow the slot/buffered-done arrays through their by-ref closure captures, and drops an unused use clause that PHPStan flagged.
The new --symlink-follow-concurrency flag is meant to speed up sites with many directory symlinks pointing outside the web root. Existing follow- symlinks coverage only exercises a handful of links, which is fine for correctness but not enough to see the rolling-window effect. This test stands up 30 directory symlinks to external targets and runs files-sync --follow-symlinks twice — once with the default sequential behavior and once with a window of 5 — asserts both runs download every payload, and prints both wall-clock durations so the speedup is visible in the test log.
The earlier version had 30 symlinks and no artificial latency, so on a loopback PHP built-in server both runs finished in milliseconds and swamped any signal from the rolling window. This injects a 5-second sleep into every export-API request via a php -S router script, so the sequential run takes roughly 74×5s ≈ 6 minutes and the concurrency=5 run takes roughly 75 seconds — large enough that the speedup shows up clearly in the test log even on a quiet runner.
ensureSite stands up an nginx vhost on the port we registered for the benchmark site, so php -S could not bind to it and exited 1 immediately. Move the delay server to 18120 and tell the importer to hit that port directly via getSiteUrl's port override.
…g the plugin The previous router required the plugin's index.php directly, which is the WordPress plugin entry — without WordPress booted around it the request crashed with HTTP 500. Switch the router to a thin reverse proxy: sleep 5s on export-API requests, then forward to the real nginx vhost on the registered port and stream the response back. The export endpoint now runs against full WordPress as it does in every other E2E test, with the artificial latency layered on top.
The php -S reverse proxy approach broke the exporter's HMAC body-hash check because php-cli-server consumes multipart request bodies into $_POST, leaving php://input empty when the proxy tried to forward them. Drop the proxy entirely. ensureSite already gives us a real WordPress behind nginx, so we just write an mu-plugin into the site that sleeps five seconds on every export-API request. mu-plugins run before regular plugins, so the sleep fires before the exporter's ?reprint-api handler intercepts the request — and after WordPress has parsed the body, so HMAC verification still sees the unmodified payload.
The mu-plugin now keeps a flock'd counter of currently-handling requests and a max-observed gauge in /tmp. The test resets the gauge before each run, reads it back afterwards, prints it alongside the wall-clock numbers, and asserts the concurrent run actually overlapped on the server. If the gauge stays at 1 we know the rolling window is serializing — either the client never issued requests in parallel or php-fpm could not hand out multiple workers — and we surface that as a failure rather than letting a 1.01x 'speedup' silently pass.
The previous concurrent path dispatched one synchronous request per slot in round-robin order, which is sequential on the wire. Wall-clock on a 74-symlink site behind a 5s server delay was 1.01x — no parallelism at all. Replace that loop with curl_multi_init / add_handle / exec / select. Each in-flight slot now owns one cURL handle on a shared multi handle, so up to N requests are truly racing in parallel and the rolling-window watermark advances as transfers actually finish. The slot's body is buffered through CURLOPT_WRITEFUNCTION and the multipart boundary is captured in CURLOPT_HEADERFUNCTION; once curl_multi_info_read reports a handle done, the body is fed through the existing MultipartStreamParser to extract data lines and the next cursor — same shape the old fetch_one_index_page() returned, just no longer blocking the dispatcher while it parses. The sequential concurrency=1 path is unchanged.
74 symlinks plus 5s of artificial latency push the exporter past the WASM PHP 256MB memory budget — utils.php OOMs inside the exporter on the very first request. The benchmark is about importer-side wall-clock dispatch, not WASM-runtime compatibility, so skip it when PHP_BINARY points at the Playground CLI shim. The other E2E PHP-version jobs still cover the test against native PHP+nginx+php-fpm.
Drop the second sequential code path under discover_symlink_targets and just call the pool with concurrency clamped to max(1, N). The N=1 case runs one request at a time, so it's wire-equivalent to the old loop, but both paths now share the same rolling-window state machine, which is where the resume semantics live. To make 'resume from the oldest unfulfilled request' actually hold when the importer crashes mid-window, each slot that finishes ahead of the committed watermark now writes its accumulated jsonl to a per-slot file under <state-dir>/.symlink-pool/. If a RuntimeException tears the importer down, the catch persists committed/buffered_done/in_flight to state and the sidecars stay on disk; the next run skips re-issuing the already-buffered slots and only re-asks for the one that didn't fulfil. When the watermark finally crosses a buffered slot, its sidecar drains into the remote index file in order and the file is removed. A clean run also rmdirs the sidecar directory on the way out. Adds an E2E scenario that forces the middle slot of a five-deep window to fail with HTTP 502 once, asserts the importer aborts with sidecars on disk, asserts the resume succeeds without re-asking for any of the non-failing targets, and confirms every payload landed locally.
The drain loop was throwing the moment any slot reported an error, which abandoned every other slot that had completed in the same multi_info_read pass — so when the failing slot raced ahead of the successful ones their data never made it into a sidecar. The resume test caught this: the post-failure sidecar directory was empty. Queue the first error in a local variable instead and only re-throw it once every finished transfer in the current pass has been processed. The successful slots now write their sidecars before the outer try/catch persists state, and the resume run drains them in watermark order without re-asking for any of the non-failing targets.
The previous fix only deferred the throw to the end of one drain pass, which still tore the multi handle down while sibling slots were on the wire — they never got the chance to land their sidecars and the resume test caught a missing pool directory. Track errored slot ids in a set instead, stop dispatching new directories and stop re-attaching the failing slot, but keep driving curl_multi until every other in-flight handle settles. Only when handles_by_slot is empty does the outer try/catch see the throw, persist state, and re-raise. The slots that succeeded alongside the failure now reliably end up as buffered_done sidecars and the next run only re-asks for the one that didn't fulfil.
… pool Resuming after a mid-window failure was redispatching every target because: (1) restored in-flight slots were re-pushed to the queue and took fresh slot ids, leaving a hole at the failed slot's id and freezing the watermark behind it; (2) buffered-done dirs weren't marked visited, so the queue happily redispatched them as new slots. Restore in-flight slots into $slots under their original slot id with their saved cursor + attempts, mark each restored or buffered dir as visited, and let the existing re-attach pass pick them up. The pool now resumes from the failed slot only — buffered sidecars drain in order once the watermark crosses the previously-failed slot id.
93e2b77 to
0e5f4a5
Compare
Resume of the rolling-window pool is still redispatching every target even though state[symlink_pool] is supposed to mark them visited. Add an audit log entry on both ends — one when persist_pool_state writes the snapshot, one when the next run picks it back up — so the CI log shows whether the restore branch actually fires and what shape the persisted state has.
audit_log only echoes to stdout in verbose mode, so the previous trace never made it into the test's captured output. Use output_progress so the persist/restore events show up in the runImporter result and CI log unconditionally.
normalize_state() runs every state write through array_intersect_key against default_state(). The pool snapshot key was never added there, so persist_pool_state was writing the watermark + buffered slots into the in-memory state, calling save_state, and silently losing them on the way to disk. The next run loaded a state without symlink_pool, the restore branch never fired, the queue rebuilt all five targets from scratch, and the resume hit every directory a second time. Add symlink_pool (defaulting to null) to default_state so the field survives normalization. Remove the persist/resume trace probes that were only added to confirm the diagnosis.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a site has many symlinked directories — a common pattern on wp.com Atomic where plugins, themes, and mu-plugins each live under a different shared root — the file-index phase must fetch a remote directory listing for every target it discovers. Doing these one at a time means the total wait grows linearly with the number of symlink targets.
This adds
--symlink-follow-concurrency=Nso the importer can process up to N symlink target directories simultaneously. The default stays at 1, keeping behaviour identical to the previous sequential path for everyone who doesn't opt in. Pass--no-adaptiveand the default rises to 5 automatically, on the theory that users disabling adaptive pacing want maximum throughput. Pass an explicit value to override; the range is clamped to [1, 32] and persisted in state so it survives resume cycles.The concurrent path uses a rolling-window slot model driven by
curl_multi: up to N requests are truly in flight at once. A committed watermark keeps the remote index file append-only in slot order — if slot N stalls mid-directory, any slots ahead of it hold their data in per-slot sidecar files until N finishes, then everything drains together. On SIGTERM or a mid-window failure the in-flight state (directory, cursor, attempt count) is persisted and restored on the next run, so already-fetched data is preserved and only the unfulfilled slot is re-issued.Symlink-follow benchmark
End-to-end test
tests/e2e/tests/import-51-symlink-follow-concurrency.test.jsstands up 74 symlinked directories on a real WordPress site behind nginx + php-fpm and injects a 5-second sleep into every export-API request via an mu-plugin so per-request latency dominates. It runsfiles-sync --follow-symlinkstwice and prints both wall-clock numbers + the max simultaneous in-flight requests the server saw.Latest CI run on
E2E (PHP 8.2):So a site that took ~6m52s to follow all symlinks now takes ~1m45s with a window of 5. The "max in-flight: 5" line is the proof that curl_multi really keeps the pipeline full — if the importer were still serializing client-side it would stay at 1.
How this differs from the
Pull pipeline performancecommentThe sticky bot comment posted on this PR comes from the
pull-pipeline-perfworkflow, which times a singlefiles-pull-large-part-peak-memoryscenario: one large multipart file body on alarge-directoryfixture, no symlinks involved. That benchmark exercises a completely different code path (multipart streaming + memory ceiling), and it's expected to show no change here (≈ runner noise) because this PR doesn't touch it. The numbers above are the ones that actually reflect what changed.The sequential code path (concurrency == 1) is left completely unchanged.