Skip to content

perf(vfs): readdir snapshot cache, opendir/releasedir on VirtualFs#172

Draft
XciD wants to merge 9 commits into
mainfrom
perf/fuse-readdir-cache
Draft

perf(vfs): readdir snapshot cache, opendir/releasedir on VirtualFs#172
XciD wants to merge 9 commits into
mainfrom
perf/fuse-readdir-cache

Conversation

@XciD

@XciD XciD commented May 20, 2026

Copy link
Copy Markdown
Member

Summary

The kernel paginates readdir over a large directory by calling our handler repeatedly with a growing offset. Each call previously called virtual_fs.readdir(ino), which:

  1. Acquires the inode-table read lock.
  2. Rebuilds a fresh Vec<VirtualFsDirEntry> containing every child entry.

For a 1000+ entry directory (large model repos, datasets) the kernel typically issues 3–4 round-trips, so we did 3–4 full snapshots per ls.

This PR snapshots once on opendir and caches it keyed by fh:

  • opendirvirtual_fs.readdir() once, Arc<Vec<...>> stored in dir_cache: Mutex<HashMap<u64, _>>.
  • readdir(fh, offset) → looks up the cached Arc, iterates by reference from offset. No lock, no rebuild.
  • releasedir(fh) → drops the cache entry.

readdir keeps a defensive fallback that snapshots on the fly if no cache entry exists (FUSE always issues opendir first, but guards against the contract being relaxed).

Tradeoff

Cache holds the snapshot for the lifetime of the open dir handle. A concurrent rmdir/unlink won't be reflected mid-iteration — but that's already the case for any FS: snapshot semantics are normal.

Tests

329 lib + 339 nfs tests pass, clippy clean. Existing FUSE integration tests in tests/fuse_ops.rs exercise readdir paths.

The kernel paginates readdir over a large directory by calling our
handler repeatedly with growing `offset`. Each call previously did
`virtual_fs.readdir(ino)`, which acquires the inode-table read
lock and rebuilds a fresh Vec<VirtualFsDirEntry> of every child.
For a 1000+ entry dir that's 3-4 full snapshots per ls.

Snapshot once on opendir, cache by fh as Arc<Vec<...>>, drop in
releasedir. readdir iterates the cached snapshot by reference,
skipping offset entries — no lock, no rebuild on continuation.

The fallback path (readdir without a matching opendir, defensive
only — FUSE always issues opendir first) still snapshots on the
fly, so behaviour is preserved.
@github-actions

Copy link
Copy Markdown
Contributor

POSIX Compliance (pjdfstest)

============================================================
  pjdfstest POSIX Compliance Results
------------------------------------------------------------
  Files: 130/130 passed    Tests: 832 total (0 subtests failed)
  Result: PASS
------------------------------------------------------------
  Category               Passed    Total   Status
  -------------------- -------- -------- --------
  chflags                     5        5       OK
  chmod                       8        8       OK
  chown                       6        6       OK
  ftruncate                  13       13       OK
  granular                    5        5       OK
  mkdir                       9        9       OK
  open                       19       19       OK
  posix_fallocate             1        1       OK
  rename                     10       10       OK
  rmdir                      11       11       OK
  symlink                    10       10       OK
  truncate                   13       13       OK
  unlink                     11       11       OK
  utimensat                   9        9       OK
============================================================

@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Benchmark Results

============================================================
  Benchmark — 50MB
------------------------------------------------------------
  Metric                                 FUSE          NFS
  ------------------------------ ------------ ------------
  Sequential read                    223.6 MB/s     122.2 MB/s
  Sequential re-read                2154.9 MB/s    2241.5 MB/s
  Range read (1MB@25MB)                0.5 ms         0.2 ms
  Random reads (100x4KB avg)           0.0 ms         0.0 ms
  Sequential write (FUSE)           1213.1 MB/s
  Close latency (CAS+Hub)            0.133 s
  Write end-to-end                   286.2 MB/s
  Dedup write                       1558.5 MB/s
  Dedup close latency                0.114 s
  Dedup end-to-end                   341.7 MB/s
============================================================
============================================================
  Benchmark — 200MB
------------------------------------------------------------
  Metric                                 FUSE          NFS
  ------------------------------ ------------ ------------
  Sequential read                    811.8 MB/s     883.8 MB/s
  Sequential re-read                2145.3 MB/s    2249.4 MB/s
  Range read (1MB@25MB)                0.3 ms         0.2 ms
  Random reads (100x4KB avg)           0.0 ms         0.0 ms
  Sequential write (FUSE)           1373.5 MB/s
  Close latency (CAS+Hub)            0.144 s
  Write end-to-end                   690.9 MB/s
  Dedup write                       1560.2 MB/s
  Dedup close latency                0.262 s
  Dedup end-to-end                   512.6 MB/s
============================================================
============================================================
  Benchmark — 500MB
------------------------------------------------------------
  Metric                                 FUSE          NFS
  ------------------------------ ------------ ------------
  Sequential read                   1000.9 MB/s    1314.9 MB/s
  Sequential re-read                2108.9 MB/s    2222.0 MB/s
  Range read (1MB@25MB)                0.3 ms         0.2 ms
  Random reads (100x4KB avg)           0.0 ms         0.0 ms
  Sequential write (FUSE)           1499.5 MB/s
  Close latency (CAS+Hub)            0.173 s
  Write end-to-end                   987.1 MB/s
  Dedup write                       1399.0 MB/s
  Dedup close latency                0.179 s
  Dedup end-to-end                   931.5 MB/s
============================================================
============================================================
  fio Benchmark Results
------------------------------------------------------------
  Job                        FUSE MB/s   NFS MB/s  FUSE IOPS   NFS IOPS
  ------------------------- ---------- ---------- ---------- ----------
  seq-read-100M                  540.5      416.7                      
  seq-reread-100M               2439.0       88.1                      
  rand-read-4k-100M                0.1        0.1         19         19
  seq-read-5x10M                 625.0      769.2                      
  rand-read-10x1M                  0.1        0.1         38         37
  Random Read Latency           FUSE avg      NFS avg
  ------------------------- ------------ ------------
  rand-read-4k-100M           52690.4 us   51372.6 us
  rand-read-10x1M             26467.3 us   26899.9 us
============================================================

XciD added 6 commits May 20, 2026 23:15
- Drop duplicated doc on opendir (dir_cache field comment is the
  canonical 'why').
- Replace iter+skip(offset) with slice indexing entries[offset..],
  bounded by min(offset, len) to avoid panic on stale cookies.
- Keep the defensive fallback in readdir even though fuser always
  pairs OPENDIR/READDIR — cost is zero on the cached path and the
  function-level comment explains it.
Quantifies the per-call cost the dir_cache PR avoids. On 1000-entry
dirs the work per FUSE round-trip drops from ~23 µs to ~7 ns
(snapshot rebuild → Arc clone). For a typical paginated ls that's
12 round-trips, the savings are 279 µs → 83 ns total.

Run with:
  cargo test --release --lib readdir_snapshot_vs_cache_cost -- --nocapture --ignored

Measured:
  n=  100  pages=  2  old=  4.88 µs  new=   83 ns  ratio=   58x
  n= 1000  pages= 12  old=  279 µs   new=   83 ns  ratio= 3364x
  n= 5000  pages= 62  old=  7.61 ms  new=  416 ns  ratio=18296x
The dir_cache lived in FuseAdapter, so NFS couldn't benefit. Move
opendir / readdir_snapshot / releasedir onto VirtualFs:

- VirtualFs::opendir(ino) snapshots children, allocates fh, pins
  the inode against eviction.
- VirtualFs::readdir_snapshot(fh) returns the cached Arc<Vec<...>>
  (None if the fh wasn't issued by opendir — caller falls back).
- VirtualFs::releasedir(ino, fh) drops the snapshot and the pin.

FuseAdapter just delegates. Drops the now-redundant pub(crate)
bump_open_handles/drop_open_handles wrappers in VirtualFs (the
inode-table methods are still there, only the FUSE-specific
wrappers go away). NFS can plug in by calling the same VirtualFs
APIs at its handle lifecycle hooks (separate change).
@XciD XciD changed the title perf(fuse): cache readdir snapshot on opendir, paginate from cache perf(vfs): readdir snapshot cache, opendir/releasedir on VirtualFs May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant