Skip to content

Speed up symlink following with optional concurrency#184

Draft
adamziel wants to merge 17 commits into
trunkfrom
adamziel/concurrent-symlinks
Draft

Speed up symlink following with optional concurrency#184
adamziel wants to merge 17 commits into
trunkfrom
adamziel/concurrent-symlinks

Conversation

@adamziel

@adamziel adamziel commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator

When a site has many symlinked directories — a common pattern on wp.com Atomic where plugins, themes, and mu-plugins each live under a different shared root — the file-index phase must fetch a remote directory listing for every target it discovers. Doing these one at a time means the total wait grows linearly with the number of symlink targets.

This adds --symlink-follow-concurrency=N so the importer can process up to N symlink target directories simultaneously. The default stays at 1, keeping behaviour identical to the previous sequential path for everyone who doesn't opt in. Pass --no-adaptive and the default rises to 5 automatically, on the theory that users disabling adaptive pacing want maximum throughput. Pass an explicit value to override; the range is clamped to [1, 32] and persisted in state so it survives resume cycles.

The concurrent path uses a rolling-window slot model driven by curl_multi: up to N requests are truly in flight at once. A committed watermark keeps the remote index file append-only in slot order — if slot N stalls mid-directory, any slots ahead of it hold their data in per-slot sidecar files until N finishes, then everything drains together. On SIGTERM or a mid-window failure the in-flight state (directory, cursor, attempt count) is persisted and restored on the next run, so already-fetched data is preserved and only the unfulfilled slot is re-issued.

Symlink-follow benchmark

End-to-end test tests/e2e/tests/import-51-symlink-follow-concurrency.test.js stands up 74 symlinked directories on a real WordPress site behind nginx + php-fpm and injects a 5-second sleep into every export-API request via an mu-plugin so per-request latency dominates. It runs files-sync --follow-symlinks twice and prints both wall-clock numbers + the max simultaneous in-flight requests the server saw.

Latest CI run on E2E (PHP 8.2):

74 symlinks, 5s server delay
  concurrency=1: 412,279 ms (max in-flight on server: 1)
  concurrency=5: 104,548 ms (max in-flight on server: 5)
  speedup: 3.94x

So a site that took ~6m52s to follow all symlinks now takes ~1m45s with a window of 5. The "max in-flight: 5" line is the proof that curl_multi really keeps the pipeline full — if the importer were still serializing client-side it would stay at 1.

How this differs from the Pull pipeline performance comment

The sticky bot comment posted on this PR comes from the pull-pipeline-perf workflow, which times a single files-pull-large-part-peak-memory scenario: one large multipart file body on a large-directory fixture, no symlinks involved. That benchmark exercises a completely different code path (multipart streaming + memory ceiling), and it's expected to show no change here (≈ runner noise) because this PR doesn't touch it. The numbers above are the ones that actually reflect what changed.

The sequential code path (concurrency == 1) is left completely unchanged.

@github-actions

github-actions Bot commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Pull pipeline performance — large-directory

Site: large-directory · 2,000+ plus targeted file-transfer scenarios files · 320,007 posts · 720,015 postmeta · PHP 8.5.5

Stage PR trunk Δ Status Details
files-pull-large-part-peak-memory 2.80 s 2.67 s ⚪ +123 ms (+4.6%) condition=files-pull large multipart file part
file=24.0 MiB
chunk_size=16.0 MiB
peak_memory=10.0 MiB
trunk: condition=files-pull large multipart file part
file=24.0 MiB
chunk_size=16.0 MiB
peak_memory=10.0 MiB
Total 2.80 s 2.67 s ⚪ +123 ms (+4.6%)

Numbers carry runner noise; treat single-run deltas as directional, not authoritative.

📈 Trunk performance history — commit-by-commit timeline.

adamziel added 14 commits May 4, 2026 17:51
When a site has many symlinked directories — a common pattern on wp.com Atomic where plugins, themes, and mu-plugins each live under a different shared root — the file-index phase must fetch a remote directory listing for every target directory it discovers. Doing these one at a time means the total wait is the sum of all those network round-trips.

This adds --symlink-follow-concurrency=N to process up to N directories simultaneously. The default stays at 1 (identical to the previous sequential path) to avoid any regression on normal sites. Pass --no-adaptive and the default rises to 5 automatically, since users opting out of adaptive pacing are typically trying to maximise throughput. Pass an explicit value to override either default; the range is clamped to [1, 32].

The new concurrent code path uses a rolling-window slot model: up to N directory slots are active at once, each advancing page by page through its own cursor. A committed watermark ensures the remote index file is only appended to in slot order — if slot N stalls, slots ahead of it buffer their data in memory until slot N finishes, at which point everything drains in one ordered write. On SIGTERM the in-flight slot state (directory, cursor, attempt count) is saved and restored on resume so no already-fetched data is discarded.

The sequential path (concurrency == 1) is left byte-for-byte unchanged; the new code only activates when N > 1.
The pull command validates --filter after handling --symlink-follow-concurrency. By eagerly persisting the resolved concurrency to disk on every run, the state file got created even when an invalid --filter caused the command to throw moments later. Tests covering pre-validation invariants (state must not exist after a rejected option) regressed.

Now the state is only written when the user actually passes the option, matching the pattern used by the other persisted options. Resolution into the in-memory property still happens on every run so concurrent symlink following stays available to the rest of the request.

Also tightens local types in the new concurrent indexer so PHPStan can follow the slot/buffered-done arrays through their by-ref closure captures, and drops an unused use clause that PHPStan flagged.
The new --symlink-follow-concurrency flag is meant to speed up sites with
many directory symlinks pointing outside the web root. Existing follow-
symlinks coverage only exercises a handful of links, which is fine for
correctness but not enough to see the rolling-window effect.

This test stands up 30 directory symlinks to external targets and runs
files-sync --follow-symlinks twice — once with the default sequential
behavior and once with a window of 5 — asserts both runs download every
payload, and prints both wall-clock durations so the speedup is visible
in the test log.
The earlier version had 30 symlinks and no artificial latency, so on a
loopback PHP built-in server both runs finished in milliseconds and
swamped any signal from the rolling window. This injects a 5-second
sleep into every export-API request via a php -S router script, so the
sequential run takes roughly 74×5s ≈ 6 minutes and the concurrency=5
run takes roughly 75 seconds — large enough that the speedup shows up
clearly in the test log even on a quiet runner.
ensureSite stands up an nginx vhost on the port we registered for the
benchmark site, so php -S could not bind to it and exited 1 immediately.
Move the delay server to 18120 and tell the importer to hit that port
directly via getSiteUrl's port override.
…g the plugin

The previous router required the plugin's index.php directly, which is the
WordPress plugin entry — without WordPress booted around it the request
crashed with HTTP 500. Switch the router to a thin reverse proxy: sleep 5s
on export-API requests, then forward to the real nginx vhost on the
registered port and stream the response back. The export endpoint now
runs against full WordPress as it does in every other E2E test, with the
artificial latency layered on top.
The php -S reverse proxy approach broke the exporter's HMAC body-hash
check because php-cli-server consumes multipart request bodies into
$_POST, leaving php://input empty when the proxy tried to forward them.
Drop the proxy entirely. ensureSite already gives us a real WordPress
behind nginx, so we just write an mu-plugin into the site that sleeps
five seconds on every export-API request. mu-plugins run before regular
plugins, so the sleep fires before the exporter's ?reprint-api handler
intercepts the request — and after WordPress has parsed the body, so
HMAC verification still sees the unmodified payload.
The mu-plugin now keeps a flock'd counter of currently-handling requests
and a max-observed gauge in /tmp. The test resets the gauge before each
run, reads it back afterwards, prints it alongside the wall-clock numbers,
and asserts the concurrent run actually overlapped on the server. If the
gauge stays at 1 we know the rolling window is serializing — either the
client never issued requests in parallel or php-fpm could not hand out
multiple workers — and we surface that as a failure rather than letting a
1.01x 'speedup' silently pass.
The previous concurrent path dispatched one synchronous request per slot
in round-robin order, which is sequential on the wire. Wall-clock on a
74-symlink site behind a 5s server delay was 1.01x — no parallelism at
all. Replace that loop with curl_multi_init / add_handle / exec /
select. Each in-flight slot now owns one cURL handle on a shared multi
handle, so up to N requests are truly racing in parallel and the
rolling-window watermark advances as transfers actually finish.

The slot's body is buffered through CURLOPT_WRITEFUNCTION and the
multipart boundary is captured in CURLOPT_HEADERFUNCTION; once
curl_multi_info_read reports a handle done, the body is fed through
the existing MultipartStreamParser to extract data lines and the next
cursor — same shape the old fetch_one_index_page() returned, just no
longer blocking the dispatcher while it parses. The sequential
concurrency=1 path is unchanged.
74 symlinks plus 5s of artificial latency push the exporter past the
WASM PHP 256MB memory budget — utils.php OOMs inside the exporter on
the very first request. The benchmark is about importer-side wall-clock
dispatch, not WASM-runtime compatibility, so skip it when PHP_BINARY
points at the Playground CLI shim. The other E2E PHP-version jobs
still cover the test against native PHP+nginx+php-fpm.
Drop the second sequential code path under discover_symlink_targets and
just call the pool with concurrency clamped to max(1, N). The N=1 case
runs one request at a time, so it's wire-equivalent to the old loop, but
both paths now share the same rolling-window state machine, which is
where the resume semantics live.

To make 'resume from the oldest unfulfilled request' actually hold when
the importer crashes mid-window, each slot that finishes ahead of the
committed watermark now writes its accumulated jsonl to a per-slot file
under <state-dir>/.symlink-pool/. If a RuntimeException tears the
importer down, the catch persists committed/buffered_done/in_flight to
state and the sidecars stay on disk; the next run skips re-issuing the
already-buffered slots and only re-asks for the one that didn't
fulfil. When the watermark finally crosses a buffered slot, its
sidecar drains into the remote index file in order and the file is
removed. A clean run also rmdirs the sidecar directory on the way out.

Adds an E2E scenario that forces the middle slot of a five-deep window
to fail with HTTP 502 once, asserts the importer aborts with sidecars
on disk, asserts the resume succeeds without re-asking for any of the
non-failing targets, and confirms every payload landed locally.
The drain loop was throwing the moment any slot reported an error,
which abandoned every other slot that had completed in the same
multi_info_read pass — so when the failing slot raced ahead of the
successful ones their data never made it into a sidecar. The resume
test caught this: the post-failure sidecar directory was empty.

Queue the first error in a local variable instead and only re-throw it
once every finished transfer in the current pass has been processed.
The successful slots now write their sidecars before the outer
try/catch persists state, and the resume run drains them in
watermark order without re-asking for any of the non-failing
targets.
The previous fix only deferred the throw to the end of one drain pass,
which still tore the multi handle down while sibling slots were on the
wire — they never got the chance to land their sidecars and the
resume test caught a missing pool directory. Track errored slot ids in
a set instead, stop dispatching new directories and stop re-attaching
the failing slot, but keep driving curl_multi until every other
in-flight handle settles. Only when handles_by_slot is empty does the
outer try/catch see the throw, persist state, and re-raise. The slots
that succeeded alongside the failure now reliably end up as
buffered_done sidecars and the next run only re-asks for the one
that didn't fulfil.
… pool

Resuming after a mid-window failure was redispatching every target
because: (1) restored in-flight slots were re-pushed to the queue and
took fresh slot ids, leaving a hole at the failed slot's id and
freezing the watermark behind it; (2) buffered-done dirs weren't
marked visited, so the queue happily redispatched them as new slots.

Restore in-flight slots into $slots under their original slot id with
their saved cursor + attempts, mark each restored or buffered dir as
visited, and let the existing re-attach pass pick them up. The pool
now resumes from the failed slot only — buffered sidecars drain in
order once the watermark crosses the previously-failed slot id.
@adamziel adamziel force-pushed the adamziel/concurrent-symlinks branch from 93e2b77 to 0e5f4a5 Compare May 4, 2026 15:52
adamziel added 3 commits May 4, 2026 22:58
Resume of the rolling-window pool is still redispatching every target
even though state[symlink_pool] is supposed to mark them visited.
Add an audit log entry on both ends — one when persist_pool_state
writes the snapshot, one when the next run picks it back up — so the
CI log shows whether the restore branch actually fires and what
shape the persisted state has.
audit_log only echoes to stdout in verbose mode, so the previous
trace never made it into the test's captured output. Use
output_progress so the persist/restore events show up in the
runImporter result and CI log unconditionally.
normalize_state() runs every state write through array_intersect_key
against default_state(). The pool snapshot key was never added there,
so persist_pool_state was writing the watermark + buffered slots into
the in-memory state, calling save_state, and silently losing them on
the way to disk. The next run loaded a state without symlink_pool,
the restore branch never fired, the queue rebuilt all five targets
from scratch, and the resume hit every directory a second time.

Add symlink_pool (defaulting to null) to default_state so the field
survives normalization. Remove the persist/resume trace probes that
were only added to confirm the diagnosis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant