perf(vfs): readdir snapshot cache, opendir/releasedir on VirtualFs#172
Draft
XciD wants to merge 9 commits into
Draft
perf(vfs): readdir snapshot cache, opendir/releasedir on VirtualFs#172XciD wants to merge 9 commits into
XciD wants to merge 9 commits into
Conversation
The kernel paginates readdir over a large directory by calling our handler repeatedly with growing `offset`. Each call previously did `virtual_fs.readdir(ino)`, which acquires the inode-table read lock and rebuilds a fresh Vec<VirtualFsDirEntry> of every child. For a 1000+ entry dir that's 3-4 full snapshots per ls. Snapshot once on opendir, cache by fh as Arc<Vec<...>>, drop in releasedir. readdir iterates the cached snapshot by reference, skipping offset entries — no lock, no rebuild on continuation. The fallback path (readdir without a matching opendir, defensive only — FUSE always issues opendir first) still snapshots on the fly, so behaviour is preserved.
Contributor
POSIX Compliance (pjdfstest) |
Contributor
Benchmark Results |
- Drop duplicated doc on opendir (dir_cache field comment is the canonical 'why'). - Replace iter+skip(offset) with slice indexing entries[offset..], bounded by min(offset, len) to avoid panic on stale cookies. - Keep the defensive fallback in readdir even though fuser always pairs OPENDIR/READDIR — cost is zero on the cached path and the function-level comment explains it.
Quantifies the per-call cost the dir_cache PR avoids. On 1000-entry dirs the work per FUSE round-trip drops from ~23 µs to ~7 ns (snapshot rebuild → Arc clone). For a typical paginated ls that's 12 round-trips, the savings are 279 µs → 83 ns total. Run with: cargo test --release --lib readdir_snapshot_vs_cache_cost -- --nocapture --ignored Measured: n= 100 pages= 2 old= 4.88 µs new= 83 ns ratio= 58x n= 1000 pages= 12 old= 279 µs new= 83 ns ratio= 3364x n= 5000 pages= 62 old= 7.61 ms new= 416 ns ratio=18296x
The dir_cache lived in FuseAdapter, so NFS couldn't benefit. Move opendir / readdir_snapshot / releasedir onto VirtualFs: - VirtualFs::opendir(ino) snapshots children, allocates fh, pins the inode against eviction. - VirtualFs::readdir_snapshot(fh) returns the cached Arc<Vec<...>> (None if the fh wasn't issued by opendir — caller falls back). - VirtualFs::releasedir(ino, fh) drops the snapshot and the pin. FuseAdapter just delegates. Drops the now-redundant pub(crate) bump_open_handles/drop_open_handles wrappers in VirtualFs (the inode-table methods are still there, only the FUSE-specific wrappers go away). NFS can plug in by calling the same VirtualFs APIs at its handle lifecycle hooks (separate change).
This reverts commit 2e4e3f3.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The kernel paginates
readdirover a large directory by calling our handler repeatedly with a growingoffset. Each call previously calledvirtual_fs.readdir(ino), which:Vec<VirtualFsDirEntry>containing every child entry.For a 1000+ entry directory (large model repos, datasets) the kernel typically issues 3–4 round-trips, so we did 3–4 full snapshots per
ls.This PR snapshots once on
opendirand caches it keyed byfh:opendir→virtual_fs.readdir()once,Arc<Vec<...>>stored indir_cache: Mutex<HashMap<u64, _>>.readdir(fh, offset)→ looks up the cachedArc, iterates by reference fromoffset. No lock, no rebuild.releasedir(fh)→ drops the cache entry.readdirkeeps a defensive fallback that snapshots on the fly if no cache entry exists (FUSE always issuesopendirfirst, but guards against the contract being relaxed).Tradeoff
Cache holds the snapshot for the lifetime of the open dir handle. A concurrent
rmdir/unlinkwon't be reflected mid-iteration — but that's already the case for any FS: snapshot semantics are normal.Tests
329 lib + 339 nfs tests pass, clippy clean. Existing FUSE integration tests in
tests/fuse_ops.rsexercise readdir paths.