Skip to content

amuleapi REST API + SSE daemon + EC search-lifecycle additions#201

Open
got3nks wants to merge 35 commits into
amule-org:masterfrom
got3nks:feature/amuleapi-v0
Open

amuleapi REST API + SSE daemon + EC search-lifecycle additions#201
got3nks wants to merge 35 commits into
amule-org:masterfrom
got3nks:feature/amuleapi-v0

Conversation

@got3nks

@got3nks got3nks commented Jun 19, 2026

Copy link
Copy Markdown

Supersedes #132.

Summary

This PR ships a standalone REST API + Server-Sent Events daemon, amuleapi, that talks to amuled as a normal EC client. The API lives at /api/v0/*; the SSE stream is at /api/v0/events. Full per-endpoint contracts live in docs/api/REFERENCE.md; the SSE event catalog with Last-Event-ID reconnect semantics lives in docs/api/EVENTS.md; first-run setup in docs/QUICKSTART-AMULEAPI.md. ETag (snapshot-versioned, memoised on (target, snapshot_at)), CORS (preflight + credentialed origin echo with allowlist), and the SSE framing/reconnect protocol are all documented under those two files.

Why standalone instead of bolting onto amuleweb

The first cut of this work was implemented inside amuleweb. We scrapped it for three reasons.

Stable amuleweb stays untouched. Merging this PR adds a new binary; it does not modify amuleweb. A 3.0.1 bugfix release can ship from master without any regression risk against the amuleweb users have been running for years. amule can ship both binaries side-by-side for a transitional period; whoever wants the REST surface enables amuleapi, whoever's on amuleweb's PHP keeps it. Operators can adopt at their own pace.

Concurrency efficiency. amuleweb runs a single long-lived EC connection too, but every request handles synchronously on the wx event-loop thread (no per-request thread spawn — see src/webserver/src/WebSocket.cpp:282 where ProcessURL is dispatched inline) and each PHP page does its own SendRecvMsg_v2 roundtrips against amuled with no cross-request snapshot cache. Adding a long-lived SSE channel that drains events for the lifetime of an EventSource would pin the event-loop thread, and bolting on the snapshot cache the REST path needs is a non-trivial refactor of the bespoke PHP interpreter. amuleapi takes the opposite shape: one EC connection feeding a 1 s refresher loop that maintains a coherent state snapshot every HTTP handler reads under a shared_timed_mutex, plus an in-process event bus the refresher publishes deltas to for SSE. No bespoke PHP runtime, no per-request EC roundtrip storm, no event-loop thread to tie up.

Modular architecture eases future integration into amuled. The HTTP, SSE, auth, ETag, state-snapshot, and event-bus scaffolding (~6000 LOC across 12 files) is EC-agnostic and ships into any future in-process integration unchanged. The EC plumbing is concentrated in three files (Refresher.cpp, RefresherTick.cpp, App.cpp's entry point) plus every individual mutation and lazy-cache handler in Api.cpp. Inside amuled those become direct model accesses against CamuleApp::theApp() — the code gets simpler (no wire-format walking) but each handler needs touch-up. This is not a deliverable for this PR — just a property the architecture preserves.

Performance model — refresher vs lazy

Two distinct caching paths land in this PR.

Refresher-driven items (every tick, snapshot-coherent). /downloads, /shared, /servers, /clients, /status, /categories are populated by a 1 s loop that runs two EC calls per tick: EC_OP_GET_UPDATE at EC_DETAIL_INC_UPDATE for the four list substructs, plus EC_OP_STAT_REQ for the status envelope. The EC_DETAIL_INC_UPDATE detail level uses amuled's per-client CValueMap to ship field-level deltas — first-encounter rows carry full identity (NAME, HASH, SIZE, ED2K_LINK), subsequent ticks carry only the bytes that actually moved. The EC_DETAIL_INC_UPDATE mechanism itself dates to 2005, but the wave of fixes that landed in 3.0.0 (full-detail emission for unseen ECIDs, the EC skip-unchanged series for per-file change tracking, alive-marker handling) are what make this work cleanly for a fresh subscriber on a cold start. One wire roundtrip per second, packet size proportional to actual change. The refresher then computes a diff against the prior snapshot and publishes one event per changed entity onto the in-process event bus via a single batched PublishBatch call, so SSE subscribers see push notifications within ~1 s. Mutating REST calls (POST/PATCH/DELETE) call RefresherTick synchronously on the way out (under the same m_ec_mtx), so the response and the next GET both reflect post-mutation state.

Lazy-fetched items (TTL-coalesced, mutation-invalidated). /stats/tree, /stats/graphs/{graph}, /logs/serverinfo, and /search/results fetch on demand with a 1 s single-flight TTL cache (CTtlCache::GetOrFetch). The first concurrent request runs the EC roundtrip; every other reader hitting the same endpoint in the same second parks on a condvar inside the cache, then reads the just-stored value — no duplicate roundtrips, no lock held across the EC call (the cache mutex is dropped during fetch so a 30 s amuled stall on /stats/tree doesn't park every reader of unrelated lazy endpoints). Mutation handlers explicitly invalidate the affected lazy cache: POST /search invalidates m_search_cache, DELETE /logs/serverinfo invalidates m_server_info_cache, etc. Smoke tests verify the next read after a mutation sees fresh data, not the pre-mutation snapshot.

The ETag layer caches (target, snapshot_at) → MD5 so safe-method GETs on a snapshot-stable target reuse the validator without re-hashing the body. If-None-Match matches → 304 with the ETag preserved per RFC 7232 §4.1.

Wire-level surface in numbers

  • 34 REST endpoints across system / auth / downloads / clients / shared / servers / categories / preferences / network control / Kad / logs / stats / search
  • 1 SSE endpoint with 6 channels (downloads, shared, servers, clients, status, logs) and the synthetic always-delivered resync event
  • JWT (HS256, 24 h lifetime, mandatory iat, lifetime cap enforced on Verify) via either Authorization: Bearer or an HttpOnly SameSite=Strict cookie
  • Two-tier rate limiting: dedicated counter for /auth/login (operator-configurable thresholds) plus a generic 30-failures/60s lockout across every other 401-returning endpoint
  • Hard caps at every layer: 16 KiB header section, 1 MiB request body, 32 JSON-nesting levels (DoS defence in front of picojson's recursive descent), 4 KiB JWT length, 32 SSE channel tokens

Testing

Two layers in this PR; both green at the tip.

C++ unit tests via ctest — 21 binaries, including JwtTest (27 cases covering crypto identity, alg/typ/iat invariants, depth-cap rejection, base64url edge cases), AuthTest (rate-limit clock injection so the suite runs in 0.44 s instead of 6.5 s), EventBusTest, EventDiffTest, StateTest (concurrent reader/writer tear-detection at 1000+ observations), RefresherTest, AmuleApiConfigTest (0600 enforcement on POSIX), JsonWriterTest, EtagTest, PathPatternsTest. The suite shapes the binary's contract; CI runs ctest on the same matrix it builds.

End-to-end curl regression matrix — 26 phase scripts under unittests/curl-tests/amuleapi/, 660+ assertions total, driven by run-all.sh. The orchestrator brings up a fresh amuleapi daemon per phase (fresh config dir, fresh rate-limit buckets, fresh state cache) against a configured amuled, runs the phase script, aggregates pass/fail. This is the "does the wire contract actually match the docs" layer — every endpoint, every error path, every method gate, every CORS preflight, every SSE replay scenario.

CI artifacts

Build matrix at https://github.com/got3nks/amule/actions/runs/27831394815 — AppImage (x86_64 + aarch64), Flatpak (x86_64 + aarch64), macOS .app (arm64 + x86_64), Windows portable .zip (arm64 + x64). Reviewers can grab the binary for their platform from the run's artifacts and exercise it against any amuled.

Frontend plan

A separate effort (not in this PR, not by me) will build a plain JS + CSS + HTML SPA that consumes this API. Two-phase rollout:

Development phase — host the SPA separately, CORS at amuleapi. Set AllowCORS=1 and CorsOriginAllowlist=https://your-app-host in amuleapi.conf and serve the SPA from any web server, including python -m http.server against a build directory. The CORS bundle already echoes Access-Control-Allow-Credentials: true so cookie auth works cross-origin out of the box.

Deployment phase — fold static-file serving into amuleapi. Add a [Server]/StaticRoot=/path/to/dist config key and a fallback handler in DispatchToHandler that resolves non-/api/v0 paths against the configured root (with the path-traversal guard LooksMalicious(path) already applies, a MIME-type table, and SPA-history fallback to index.html). The architecture already has the pieces: CHttpServer::Response.body is a std::string, content_type is per-response. This is a future PR; the current scope is "REST + SSE surface ready to serve".

Transitional period

Everything in this PR is in flight in the sense that the API surface is /api/v0/, frozen against breaking changes for the lifetime of v0, but the architecture is up for discussion. If a better design appears mid-3.x, we can pivot — amuleapi is opt-in via BUILD_AMULEAPI=YES and operators not running it are unaffected. The standalone-daemon shape minimises the cost of a rethink relative to what an amuleweb merger would have locked in.

got3nks added 16 commits June 20, 2026 18:59
Adds a new daemon `amuleapi` that connects to amuled via EC and serves a
versioned JSON REST API plus a long-lived Server-Sent Events stream over
HTTP. Built on Boost.Beast; runs as its own process so it can coexist
with amuleweb against the same amuled instance on independent ports
(amuleweb 4711, amuleapi 4713).

Wire-level guarantees:

- JWT-based auth with admin / guest roles; session-cookie fallback for
  browser clients
- Per-route rate limiting + IP-based login lockout
- Conditional GET on every read (ETag + If-None-Match -> 304)
- Live event stream with `Last-Event-ID` reconnect replay and a typed
  `resync` event for cache invalidation (gap detection + cross-restart
  ID reset)
- Refresher polls EC every 1 s and emits typed diff events:
  download_added / _updated / _removed and the same shapes for shared
  files, server list entries, peer clients; status_changed for
  connection / kad / speed rollups; log_appended for amuled log tail
- Optional CORS allowlist via amuleapi.conf[Server]/CorsOriginAllowlist
  (off by default; empty list = echo any origin in a cookie-compatible
  way; non-empty = strict per-origin allowlist)
- Lazy-fetch TTL coalescing for stats / search / serverinfo so multiple
  concurrent reads share one EC roundtrip

Concurrent REST clients share one EC connection to amuled via a
process-wide mutex on every SendRecvSerialized; amuled's
single-reader CRemoteConnect requirement stays intact under load.

API contract lives under docs/api/ (per-endpoint fragments).
First-run setup: docs/QUICKSTART-AMULEAPI.md. English man page at
docs/man/amuleapi.1.in.

New shared static lib libwebcommon collects the helpers amuleweb and
amuleapi both need (ETag, path patterns, JSON writer) so neither code
path re-implements them.

Tests:

- 10 muleunit C++ suites: AmuleApiConfigTest, AuthTest, EtagTest,
  EventBusTest, EventDiffTest, JsonWriterTest, JwtTest,
  PathPatternsTest, RefresherTest, StateTest
- 18 curl-driven HTTP smokes under unittests/curl-tests/amuleapi/
  exercising every endpoint end-to-end against a live amuled, plus a
  run-all.sh orchestrator that brings up a fresh daemon per phase

Packaging: amuleapi rides every existing recipe (macOS .app via the
build.sh bin_paths array, AppImage via EXTRA_BINS + argv[0]-dispatch
AppRun, Flatpak local + Flathub manifests, Windows NSIS via the
recursive File /r). docs/INSTALL.md gains a BUILD_AMULEAPI row.

Disabled by default in cmake; enable with -DBUILD_AMULEAPI=YES or
-DBUILD_EVERYTHING=YES. The CI build matrix (ccpp.yml) now sets
BUILD_AMULEAPI=YES so every push exercises the new compile path.
…review

Adds the endpoints the PR amule-project#132 review pass flagged as missing or
shape-mismatched against the agreed contract, plus three operator
ergonomics tweaks:

Missing endpoints now shipped:
- POST /shared/reload — rescan share roots (EC_OP_SHAREDFILES_RELOAD)
- POST /servers/update — refresh server list from a `server.met` URL
- POST /servers/<ip>:<port>/connect — address-keyed alias for the
  existing ECID form; resolves <ip>:<port> against the cache and
  delegates to the ECID handler
- DELETE /servers/<ip>:<port> — same alias for the removal path
- DELETE /logs/amule — EC_OP_RESET_LOG plus a local
  CState::ClearAmuleLog so the next GET returns the post-reset
  state immediately (the refresher's incremental append-only path
  can't shrink the cache)
- DELETE /logs/serverinfo — EC_OP_CLEAR_SERVERINFO plus
  TtlCache::Invalidate on the server-info lazy cache; without the
  invalidate, the next GET would keep serving stale text until the
  1 s TTL expires

Body / shape additions:
- POST /downloads accepts `{"links":[...]}` (RFC §4.2 shape) in
  addition to the existing `{"ed2k_link":"..."}` singular form;
  mixing both in one request is a 400. Partial success in batch
  mode returns 207 with `accepted` / `failed` / `failed_links`
- POST /networks/disconnect now reads an optional
  `{"network":"ed2k"|"kad"|"both"}` selector and routes to the
  matching EC opcode (SERVER_DISCONNECT / KAD_STOP / DISCONNECT);
  no body or `"both"` preserves the original parameterless contract
- GET /status now includes `kad.network:{users,files,nodes}` —
  the rollup the RFC §4.1 contract specifies. The data was always
  available (KadSnapshot already parses EC_TAG_STATS_KAD_USERS /
  _KAD_FILES / _KAD_NODES on the EC_OP_STAT_REQ at EC_DETAIL_FULL);
  only the /status response was missing it
- GET /clients gains `?filter=uploads|downloads|active` to subset
  the peer list by transfer state (replaces the legacy `/uploads`
  page server-side)
- GET /events gains `?channels=<csv>` to subscribe to a subset of
  the event types. Synthetic `resync` events are delivered
  regardless of filter (their purpose is a cache-invalidation
  signal the client cannot opt out of)

Mutation cache hygiene: every mutating endpoint either runs an
inline RefresherTick to repopulate the State-backed cache (which
also fires the corresponding SSE diff events) or, for the two
lazy-cached endpoints, explicitly invalidates the TtlCache so the
next GET re-fetches.

phase11.sh smoke (30/30 PASS) exercises each new endpoint and
includes a post-mutation freshness check on /logs/amule and
/logs/serverinfo — fast GET immediately after the DELETE must
return the empty state, not the pre-reset cache.
Builds were green on macOS Release but failed on Ubuntu and Windows
Release/Debug and macOS Debug:

- src/webapi/Refresher.h declared `std::vector<std::string>` without
  including <string>. libc++ (macOS) brings <string> in transitively
  through <vector>; libstdc++ on Ubuntu and MinGW do not, so every TU
  that included Refresher.h failed with `'string' is not a member of
  'std'`.

- src/webapi/State.cpp used `std::unique_lock` without including
  <mutex>. State.h ships <shared_mutex> which is enough for
  shared_lock + shared_timed_mutex, but libstdc++ does not re-export
  unique_lock from there, so the cpp failed to compile on Ubuntu and
  MinGW with `'unique_lock' is not a member of 'std'`.

- src/webapi/Api.cpp used `std::ostringstream` without including
  <sstream>. Worked transitively from picojson.h on most platforms;
  add the explicit include defensively.

- unittests/tests/CMakeLists.txt: RefresherTest links libec.a but did
  not link LoggerConsole.cpp. Release builds preprocess libec's
  debug-emit paths to `do {} while(0)` via the ECLog.h define, so the
  symbols (DoECLogLine, theLogger, CLogger::AddLogLine) are unused;
  Debug builds keep the calls and require the symbols. Mirror
  amuleapi's own webapi/CMakeLists.txt which already pulls
  LoggerConsole.cpp in.

No behavior change; only build hygiene to get the CI matrix green.
High — must-fix:
- amuleapi.conf is now mode-checked on every load and seeded with
  mode 0600 on first run. The conf carries the EC password in
  plaintext (App.cpp:202-210 MD5s it for the base class' hashed
  field) so it gets the same owner-only enforcement as the
  jwt-secret and passwords files. EnforceOwnerOnly() runs on the
  conf in LoadAmuleapiConf after the seed step.
- --set-admin-pass / --set-guest-pass now propagate exit codes via
  std::exit(rc). Previously OnInit() returned false on failure,
  which cancels the run but still produces exit 0, so operators
  scripting `amuleapi --set-admin-pass=... && systemctl restart
  ...` could miss disk-full / mode-bit-rejection failures.
- docs/QUICKSTART-AMULEAPI.md no longer lists /uploads. The endpoint
  was retired in Phase 4g and /clients (with the new ?filter=
  query) covers the per-peer surface.

Medium:
- Replaced the "is the field at its default value?" heuristic in
  LoadAmuleapiConfig with wxCmdLineParser::Found() predicates for
  --host / --port / --password. The old code would silently
  overwrite an operator-supplied `--host=127.0.0.1` with the conf
  value because it conflated "passed nothing" with "passed the
  default verbatim."
- Dropped the no-op --daemon switch. The flag was wired to
  m_cliForeground = false but the field was never read. Mention
  in --foreground's help still notes "default."
- WritePasswordsFile / WriteJwtSecretFile route through a new
  WriteFileAtomic0600 helper that writes to a sibling .tmp,
  fsync()s, then rename(2)s into place. A short write, signal, or
  crash mid-write now leaves the original file intact instead of
  truncating the only admin/guest credentials the daemon has.
- Auth.cpp + Api.cpp pick up the #ifdef _WIN32 / #include
  <strings.h> shim that libwebcommon/HeaderParse.cpp already uses
  for strncasecmp. POSIX puts the symbol in <strings.h>; glibc
  also re-exports via <string.h> but musl and BSDs don't.
- run-all.sh computes the repo root from $SCRIPT_DIR instead of
  hard-coding the original author's local path. AMULEAPI_ROOT
  remains an env override.

Low — nits-as-comments + one test:
- Comment at the SSE drain loop documents that ev.name is server-
  controlled (compile-time literals from EventBus::Publish call
  sites) so the un-escaped `event: <name>` write is safe in
  practice — and what to sanitize if that ever stops being true.
- Comment at CRevocationSet::GcExpired() flags the O(n) sweep as
  load-bearing only at single-operator scale and points to the
  min-heap upgrade if the population ever climbs.
- Comment at the per-streaming-session std::thread spawn calls out
  that the BindAddress=127.0.0.1 default is load-bearing against
  a thread-per-connection DoS, and points at the async-on-
  io_context migration that would close it off.
- JwtTest gains direct boundary tests for Base64UrlDecode's
  len%4==1 and non-zero-residue rejections. Both craft a token
  with malformed b64 in the header section + a valid HMAC over
  the same signing input, so the constant-time MAC compare passes
  and Verify() actually reaches Base64UrlDecode (where the
  rejection must come from).
- Comment at MakeSetCookie notes that the Max-Age=0 boundary on
  an already-expired expires_at is deliberate — the cookie
  expires on receipt, which is the right behaviour for
  logout-on-issue-of-expired-token paths.

Drive-by CI fixes (caught while validating the review fixes):
- EventBus::Publish raced on ID assignment: fetch_add lived
  outside the lock so a publisher could win the lock-order race
  but lose the id-order race, and a subsequent Drain would emit
  events in publish order with non-monotonic ids — failing
  EventBusTest's ConcurrentPublishersHaveDistinctIds on the
  macOS Debug runner. Moved fetch_add inside the lock so id
  assignment is atomic with the push_back.
- Refresher.cpp's FULL-detail GetIPv4Data fallback path was
  removed. amuleapi only ever asks for EC_DETAIL_INC_UPDATE so
  the fallback was dead code in production, and it tripped a
  libec Debug-build assertion on RefresherTest fixtures (which
  pack the ECID as a uint32 in the EC_TAG_SERVER slot, exactly
  the not-IPv4-not-UNKNOWN case the assertion guards).
- AmuleApiConfigTest gains #include <cstring> for std::strlen
  (Windows MinGW does not bring it in transitively).
High — must-fix:
- SSE channel-filter heartbeat starvation. The drain loop now tracks
  the wall-clock since the last byte written and emits `: keepalive`
  whenever it falls behind the 15 s budget, on EITHER the
  drained-and-emitted, drained-but-all-filtered, or
  drained-nothing branch. Previously a busy bus paired with a
  `?channels=` that filtered every event left the wire silent
  indefinitely — Drain returned immediately each time so the
  Drain-timeout-based heartbeat never fired, and NAT / load
  balancers / EventSource drop idle TCP connections after
  30-60 s.

Medium:
- HandleStatus now reads through a new CState::Dashboard() accessor
  that returns (status, kad, snapshot_at, ec_connected) under a
  single shared_lock. Previously each accessor took its own lock so
  a refresher tick could interleave between Status() and Kad() and
  produce a /status response where `kad.network` described tick
  N+1 while `ed2k.*` / `speeds.*` described tick N.
- HandleDownloadAdd no longer 503s mid-batch on an EC disconnect.
  Per-link results split into three buckets — accepted_links,
  failed_links, disconnected_links — and the response surfaces
  all three in counters plus arrays. 202 (all clean), 207
  (mixed), 503 (every link blocked by the same EC outage —
  nothing landed). Caller can retry just the disconnected subset
  once /status reports ec_connected=true.

Low — nits-as-comments:
- event_channel lambda now documents the bare-name fallback:
  every bus event MUST contain `_`, the prefix names the channel.
  resync stays the only no-underscore name and is published
  directly to the writer (bypassing EventBus::Publish, hence
  the filter), so the lambda's fallback is a defensive net for
  future names rather than a routed code path.
- phase11.sh /clients?filter=active assertion: the prior comment
  ("no peer is in both states simultaneously") was contradicted by
  the assertion's tolerance for double-counted peers. Re-stated as
  `active ∈ [max(up, down), up+down]` to match the semantics
  (peers can be uploading AND downloading at once; active is the
  set union, sum is the multi-set sum).
- docs/QUICKSTART-AMULEAPI.md gains a "Notes on a few responses"
  section explaining the 207 multi-status on POST /downloads
  (the 503 reservation is documented too), and a security-notes
  section calling out POST /servers/update as an
  admin-authorized SSRF surface and the load-bearing
  BindAddress=127.0.0.1 default.

Drive-by CI fixes:
- mingw-w64 link failures on JwtTest / EtagTest / JsonWriterTest
  / PathPatternsTest / AmuleApiConfigTest / AuthTest / StateTest
  / EventBusTest / EventDiffTest. muleunit is an OBJECT library
  that pulls in src/libs/common/MuleDebug.cpp directly, which
  references CFormat (defined in src/libs/common/Format.cpp).
  On macOS/Ubuntu the symbols come through via wxWidgets / libc
  link order; MinGW's strict linker rejects. Added mulecommon to
  each test's target_link_libraries (CTagTest already pulls
  Format.cpp via its own sources list, so the pre-existing tests
  weren't affected).
- phase4c-bis.sh used to assert DELETE /logs/{amule,serverinfo}
  → 405. Phase 11 made DELETE a real mutation, so the test
  now asserts PATCH → 405 (the logs are read-and-reset only,
  never partially mutable) — same intent, post-Phase-11 contract.

Verified locally:
- ctest: 21/21 (all webapi + libwebcommon suites)
- run-all.sh curl matrix: every phase smoke green standalone
  (phase8d occasionally flakes on the first hit after a daemon
  restart — passes on re-run; not in any failure path)
The test was hard-coded to put its per-test config dir under `/tmp`,
which doesn't exist on Windows MinGW (the CI matrix's mingw-w64
Release/Debug jobs failed every test with "The system cannot find
the path specified" from wxRemoveFile / wxRmdir / wxMkdir).

Switched to `wxStandardPaths::Get().GetTempDir()`, which returns
`/tmp` on POSIX and `%TEMP%` on Windows. Same intent, portable.

Also using `wxGetProcessId()` instead of `::getpid()` so the source
doesn't carry a POSIX-only call site outside the `#ifndef _WIN32`
include guard.

ctest 10/10 locally; no shape change to the tests themselves.
Two pre-PR blockers:

- LastSeenState race. Every Phase-5 mutation handler calls
  RefresherTick() inline on the HTTP-server worker thread; the
  wxApp refresher loop calls it once per second. Both paths used
  to end in EmitDiffsAndUpdate(bus, m_last_seen, state), which
  reads and mutates the per-substruct maps in m_last_seen without
  any internal lock. Two threads doing std::map insert/erase on
  the same map is UB; an attacker (or operator) coupling a busy
  client to inline mutations could corrupt the diff baseline and
  crash the daemon. Split: RefresherTick() no longer touches
  m_last_seen at all. A new EmitDiffsForEventBus() helper does the
  walk; the wxApp refresher loop in App.cpp (the single writer
  thread) calls it after each successful tick. SSE subscribers
  see post-mutation diffs on the next natural tick (≤1 s) instead
  of immediately — acceptable cost for memory safety.

- SIGPIPE never masked. On Linux, an SSE peer that disappears
  mid-write raises SIGPIPE on the next asio::write to the closed
  fd. The streaming-handler socket writes don't pass MSG_NOSIGNAL,
  so without ignoring SIGPIPE the default disposition kills the
  daemon on every dropped EventSource. OnRun() now sets
  std::signal(SIGPIPE, SIG_IGN) alongside the existing
  SIGINT/SIGTERM/SIGHUP handlers; the writes return EPIPE and
  the streaming-handler loop bails on the next writer.Alive()
  poll.

libwebcommon High:

- CJwt::CJwt(secret) now throws std::invalid_argument on an empty
  signing key. CryptoPP's HMAC accepts a null/zero-length key
  without complaint, so a config bug that truncated the
  amuleapi-jwt-secret to 0 bytes used to produce a daemon that
  happily issued and verified tokens against an empty key.
- WriteEscapedString now substitutes U+FFFD (replacement char)
  for unpaired UTF-16 surrogates instead of falling through to
  BMP serialization and producing invalid UTF-8 once the buffer
  flushes. Affects only paths that route raw wxStrings through
  the JSON writer with broken Unicode input.
- FindCookieValue now strips RFC 6265 §5.2 OWS from both ends of
  the value (`name= value ` → `value`, not ` value `). Without
  it a JWT cookie set by a client that inserted a leading space
  after `=` would fail ConstantTimeEquals on every Verify.

libwebcommon Medium:

- ConstantTime.h gains a precondition block on the wxString
  overload: the length-mismatch check is NOT constant-time but
  is safe today because every caller compares fixed-shape inputs
  (32-char MD5 hex, 43-char b64url MAC). Future length-variable
  callers must pad or accept a length-side-channel leak.
- CJwt::Verify now allows a 5-second exp leeway and rejects an
  iat claim more than 60 s ahead of the verifier's clock. Today
  the issuer and verifier share a clock so the skew is zero;
  tomorrow they may not (federated tokens, reverse-proxy auth
  handoff).

webapi High:

- CRateLimiter is now actually sliding, not tumbling. The legacy
  `failure_count + window_start` shape reset wholesale when the
  current window expired, letting threshold-1 attempts in the
  tail of window N + threshold-1 in the head of window N+1 slip
  through. Replaced with a std::deque<std::time_t> per IP: each
  NoteFailure evicts stamps older than window_seconds and
  appends now; lockout fires when the live count crosses
  threshold. Two new AuthTest cases pin (a) lockout expiry after
  lockout_seconds and (b) the split-attempt regression that the
  old tumbling impl would have passed.
- SSE concurrent-session cap. A new process-wide atomic counter
  bounds streaming sessions at kMaxConcurrentStreamingSessions
  (32). The slot is acquired before the per-session OS thread is
  spawned; over-the-cap requests get 503 with
  `error.code=sessions_exhausted` + Retry-After:10. The cap
  caps the thread-per-connection DoS amplifier that becomes
  reachable if an operator switches BindAddress to a non-
  loopback. A startup WARN line also fires when BindAddress
  isn't loopback, so the operator sees the trade-off in the
  daemon log.
- Session destructor now asserts the m_worker_exited atomic
  before detach()ing the thread handle. The "worker holds
  shared_from_this() across its run, so the dtor only sees
  it after the worker exited" invariant is now enforced at
  runtime instead of by-comment; a future refactor that drops
  the shared_from_this() capture fails loudly rather than
  mysteriously deadlocking.

data layer (covered by BLOCKER 1 + comment fix):

- m_partfile_rle comment now matches reality: the mutator is
  CState::MutateDownloads under the State write lock, NOT
  m_ec_mtx. m_ec_mtx is incidentally held across the same call
  stack but isn't the actual serializer.

webapi Medium:

- ParseEcidPath now uses std::strtoull with errno + ULLONG_MAX
  + > 0xFFFFFFFF gate. On Windows `unsigned long` is 32-bit, so
  the prior > 0xFFFFFFFFu overflow check was a tautology and a
  bogus high-value path segment would saturate to ULONG_MAX and
  silently match an actual ECID 0xFFFFFFFF.
- MakeSetCookie / MakeClearCookie use std::string instead of a
  256-byte snprintf buffer. Today's HS256 JWT fits; tomorrow's
  longer payload (extra claim, alg switch) won't truncate.

tests:

- AuthTest gains (1) RateLimiter_LockoutExpiresAfterLockoutSeconds
  (1.1 s real-time test for "infinite lockout" regression),
  (2) RateLimiter_SlidingWindowSplitAttemptsStillLockOut for the
  ring-buffer behaviour the old tumbling impl would have failed,
  and (3) RevocationListBlocksOtherwiseValidToken — the cross-test
  the review flagged as missing, composing CJwt::Verify with
  CRevocationSet so a regression in either component's lookup
  fails the test.
- run-all.sh: pkill scoped to `--config-dir=/tmp/amuleapi-regtest`
  so a dev's open editor doesn't get killed. Daemon readiness wait
  is now `curl /version` polled with timeout instead of
  `sleep 1` (matches the line-19 comment about cold-start cost).
  A 429 probe after a failed phase prints a "phase3 left the
  lockout armed; restart amuleapi" tip so the next dev doesn't
  lose an hour.
- phase5a.sh / phase8b.sh / phase11.sh: cleanup trap now also
  DELETEs the Ubuntu ISO test partfile (when ADMIN_TOKEN is
  available) so a failed run doesn't survive into the next as
  6.6 GB of disk pressure on the Windows VM.

docs/QUICKSTART-AMULEAPI.md:

- Drop the broken `docs/api/` cross-reference (the directory
  doesn't exist yet); point at the source-of-truth handler
  routing in `src/webapi/Api.cpp` instead.
- Add the amuled-cohabitation callout above the config-dir
  table (same data dir as amule.conf; no collisions; both daemons
  read disjoint files).
- Reword the `--set-admin-pass` / `--set-guest-pass` block to
  explicitly say (a) no HTTP server is started, (b) no EC
  connection is attempted, (c) the exit code reflects success
  / failure so chains like
  `amuleapi --set-admin-pass=... && systemctl restart amuleapi`
  short-circuit on write failure.
- CORS section gains a callout: the empty-allowlist form echoes
  the caller's Origin (not literal `*`), because
  `Access-Control-Allow-Credentials: true` is incompatible with
  the literal wildcard and cookie-auth would break otherwise.

Verified locally:
- ctest: 21/21 (new AuthTest cases included)
- run-all.sh standalone (after restart): phase8a 11/11, phase8c
  7/7, phase8d 9/9, phase11 30/30. (Without a daemon restart
  immediately after phase3 the new sliding-window correctly
  refuses login — that's the intent; the orchestrator's
  per-phase restart clears the bucket.)
- RateLimiter_SlidingWindowSplitAttemptsStillLockOut now actually
  discriminates. The prior timing (window=5, threshold=3, single
  sleep 1.1 s between failures) passed against both the old
  tumbling impl and the new sliding impl — the failures landed
  inside the same window from the OLD code's perspective so the
  reset never fired and the count went 1→2→3 regardless. Tightened
  to (window=3, threshold=3) with failures at integer-second
  offsets t=0, 3, 4, 5 — at t=4 the OLD code triggers the wholesale
  reset and the count drops to 1; at t=5 OLD count=2 (no lockout)
  while NEW evicts the t=0/t=1 stamps and tallies 3 within the live
  3-second window (lockout). Verified with a standalone harness
  replaying the sequence against the OLD impl: locked=NO,
  count=2.
- Session::~Session now wraps the worker-exit invariant in a hand-
  rolled `if (!flag) std::abort();` instead of plain `assert()`,
  so the check survives -DNDEBUG (compiled into every Release /
  RelWithDebInfo build). The worker lambda sets the flag via an
  RAII WorkerExitMarker struct rather than a bare end-of-body
  statement, so a future refactor that adds an `early return`
  after the catch block still trips the flag before `self`
  drops.
- run-all.sh's final-teardown `pkill -f amuleapi` was the last
  un-scoped sweep (the per-phase one at line 53 was already
  narrowed to `--config-dir=/tmp/amuleapi-regtest`). Now the
  teardown uses the same pattern, so a developer with
  `vim path/to/amuleapi.cpp` open keeps their editor across
  full-matrix runs too.
- docs/QUICKSTART-AMULEAPI.md: the cohabitation-with-amuled
  callout now sits ABOVE the per-platform config-dir table (the
  layout the prior commit message claimed but didn't actually
  deliver). The intro line + callout + "default location" + table
  reads more linearly too.
- src/libwebcommon/picojson.LICENSE: reproduce picojson 1.3.0's BSD-2
  notice next to the vendored header, mirroring how muleunit ships
  unittests/muleunit/license.txt. BSD-2 clause 2 obliges any binary
  distribution (AppImage, Flatpak, .deb, macOS bundle, Windows
  installer) to include the notice in materials provided with the
  distribution — without this file the shipped binaries were
  technically out of compliance even though the source-form clause
  was satisfied by picojson.h's in-file comment.
- docs/THIRDPARTY.md: full credits/license reproduction for picojson
  (BSD-2) and muleunit (LGPL v2.1, test-binary-only — explains why
  the relinking clause does not constrain end-user binaries since
  muleunit is never linked into amule/amuled/amulegui/amuleweb/
  amuleapi).
- LICENSE.md: short header pointing at docs/THIRDPARTY.md for the
  third-party components, leaving the GPLv2 text unchanged below.
- src/libwebcommon/Jwt.cpp: count '{' + '[' in each b64-decoded
  JSON section and reject when > 32 BEFORE calling picojson::parse
  on either the header (Jwt.cpp:225) or the payload (Jwt.cpp:250).
  picojson's _parse_array/_parse_object is unbounded recursive
  descent; both parses run before the MAC verdict reaches the
  caller, so an unauthenticated peer could submit a token in
  Authorization: Bearer whose b64-decoded JSON nests deep enough
  to blow the worker thread's stack (musl pthreads cap at 128 KiB).
  Real JWTs are flat — a cap of 32 leaves ample headroom; the
  scan is O(body length) with no allocations.
- src/webapi/Api.cpp: pull the same opener-count cap into
  ParseJsonObjectBody so every mutating-body parse site is covered,
  and route HandleLogin through ParseJsonObjectBody instead of
  calling picojson::parse directly — HandleLogin is reachable
  unauthenticated, so it needs the same defence as the JWT
  surface. Forward declaration in the helpers-anon-namespace block
  near the top of the file so HandleLogin (defined above
  ParseJsonObjectBody) can call it.
- unittests/tests/JwtTest.cpp: two new regressions
  (DeeplyNestedPayloadRejected, DeeplyNestedHeaderRejected) build
  a token whose nesting is 200 levels deep and sign it with the
  matching HMAC so the MAC compare succeeds and the depth cap is
  the only available reject path. Suite goes 23 → 25 cases, all
  21 ctest binaries still green.
Six batches landed against the final pre-PR review. All 21 ctest
binaries still green; JwtTest grew to 27 cases (depth-cap + iat
mandatory regressions); AuthTest dropped from 6.5 s → 0.44 s
after clock injection.

Batch 1 — quick wins
- HttpServer.cpp: route 5xx body through CJsonWriter::ValueString
  (H-S1).
- Auth.cpp Check(): evict stale failure stamps the same way
  NoteFailure does (H-S2).
- EventDiff.cpp: capture-on-first-call thread-id assertion in
  EmitDiffsAndUpdate; hard abort if a second thread publishes
  (H-S3).
- PathPatterns.{h,cpp} + Api.cpp: LooksMalicious(path) defence-
  in-depth — reject NUL / %00 / %2e%2e / literal ".." segments
  before routing (H-S11).
- Jwt.cpp Verify: reject tokens > 4 KiB before any Base64Url walk
  (M-S10).
- Api.cpp ?channels= cap at 32 tokens (M-S13).
- Api.cpp HandleLogout: revoked-token now 200 noop, not 401
  (M-S14).
- Api.cpp: kSessionCookieAttrs as the single source of truth
  for Set/Clear cookie attribute strings (M-S11).
- App.h: rename PartfileRleState() →
  PartfileRleStateRequireStateWriteLock() to encode the
  precondition (M-S2).
- State.cpp: comment the tick-end-stamping skew on m_snapshot_at
  and the "events atomic, state isn't" invariant (M-S3 + M-S12).

Batch 2 — auth / JWT hardening
- Jwt.cpp Verify: iat is now mandatory; additionally cap
  (exp - iat) ≤ TOKEN_LIFETIME_SECONDS + skew so even a
  compromised secret can't mint a 100-year token (H-S10).
- Api.cpp HandleLogin: drop the JWT from the JSON body by
  default — browsers get the cookie only; bearer clients opt
  in via Accept: application/jwt or ?type=bearer (H-S12).
- Jwt.h: signing secret now stored as CryptoPP::SecByteBlock so
  the live key gets scrubbed at destruction; constructor also
  zeroes the caller's vector after copying (M-S1).
- AmuleApiConfig.cpp: first-run amuleapi.conf write routed
  through WriteFileAtomic0600 — no truncated configs on mid-
  write crash (M-S5).
- Api.cpp: new generic 401 rate limiter (30/60s, 5-min lockout)
  applied to every auth-protected endpoint via
  AuthenticateRequestRateLimited, including HandleLogout's
  inline-soft path (M-S7).
- Api.cpp ResolveServerEcidByAddress: drop the s.address
  string fallback; require IPv4-shaped match (M-S9).
- curl-tests: opt every login URL into ?type=bearer so the
  script harness keeps reading the token out of the body.
- unittests/tests/JwtTest.cpp: two new regression cases
  (MissingIatRejected, ExpIatLifetimeCapExceeded) and
  EmptyJtiRejected updated to include a valid iat so it
  exercises empty-jti rather than the new mandatory-iat path.

Batch 3 — SSE concurrency
- Api.cpp DispatchEvents: post-Drain gap detection — if the
  cursor fell off the ring between drains, emit synthetic
  resync (reason=gap) and restart from newest. Previously
  only reconnects caught gaps; a cold-start tick on a 5K-
  download library would publish ~5K events, evict 4900,
  and the live drainer would silently miss them (H-S4).
- EventBus.{h,cpp}: Shutdown() + IsShutdown() latch. App's
  OnExit calls Shutdown() before tearing down the
  dispatcher; Drain wakes blocked subscribers, the SSE
  worker loop polls IsShutdown() and exits — closes the
  UAF window where a detached worker held m_dispatcher
  across dispatcher.reset() (H-S7).
- HttpServer.{h,cpp} + Api.{h,cpp}: new StreamingPreflight
  callable runs synchronously on the I/O thread BEFORE the
  SSE worker thread spawns and BEFORE the 32-slot
  streaming-session budget is touched. Unauth peers get
  401/429 without burning a slot for the read-timeout
  window (H-S8).
- App.cpp refresher loop: wall-clock-bounded to a 1 s
  target cycle (was a fixed 4×250 ms sleep regardless of
  tick duration) + WARN log on any tick > 3 s — typically
  signals EC-mutex contention (H-S5).

Batch 4 — cache / perf
- TtlCache.h: split GetOrFetch into "fresh? return cached"
  vs "claim inflight via condvar, drop mu, run fetch,
  broadcast". A 30 s amuled stall on /stats/tree no longer
  parks every concurrent reader (H-S6).
- Api.h + Api.cpp: snapshot-versioned ETag memoization keyed
  on (request target, m_state.SnapshotAt()). Previously every
  200 GET/HEAD ran MD5 over the (potentially multi-MB) body;
  on a 10K-shared-file daemon that was the dominant CPU cost
  on the safe-method path (M-S6).
- Pagination (M-S8) deferred per user direction.

Batch 5 — tests
- Auth.h + Auth.cpp: clock injection into CRateLimiter via
  optional Clock parameter (defaults to std::time(nullptr));
  AuthTest's sliding-window / lockout-expiry tests now run
  with a controllable clock instead of real-time sleeps —
  AuthTest dropped from 6.5 s → 0.44 s (H-T1).
- JwtTest MalformedBase64Rejected: drop the misleading
  "constant-time" framing since no timing assertion exists
  (H-T2).
- phase4.sh: poll /status until 200 instead of sleep 2 —
  doesn't drift to disconnected under Windows VM load
  (H-T3).
- phase4.sh test 4: real guest login instead of the comment-
  only stub; _die if guest pass isn't configured so the
  fixture gap surfaces (H-T4).
- phase3.sh: add 5th-attempt 429 check (was only the 8th)
  so the lockout boundary is pinned (M-TD6).
- StateTest: ASSERT_TRUE(observed > 1000) instead of > 0 so
  the tear-detection harness can't pass on a single
  observation (M-TD7).
- AmuleApiConfigTest: document why the mode-bit test is
  POSIX-only and what a future Windows port would land
  (M-TD8).

Batch 6 — docs / man / cmake
- AmuleApiConfig.cpp: document the deferred --rotate-jwt-
  secret CLI; manual rotation via "delete the file + restart"
  works today (M-S4).
- cmake/options.cmake BUILD_AMULEAPI block: document the
  hard-fail dependency chain (cryptopp via NEED_LIB_EC,
  Boost via unconditional cmake/boost.cmake) so a future
  refactor doesn't accidentally let amuleapi soft-disable
  (M-TD1).
- docs/QUICKSTART-AMULEAPI.md: callout disambiguating
  --port=4712 (EC port amuleapi talks TO) vs the 4713 HTTP
  listener (port REST clients talk TO); curl example
  switched to ?type=bearer (M-TD2).
- docs/man/amuleapi.1.in: drop the non-existent -V short
  flag from SYNOPSIS + OPTIONS; the wired form is
  --version (M-TD3 + M-TD4).
- docs/man/CMakeLists.txt: comment-only ack of the
  translated-manpage gap — adding amuleapi to MANPAGE_BINARIES
  today fails the build because po/manpages-*.po has no
  amuleapi strings. English-only until 3.1; explicit
  wiring instructions in the comment for whoever lands the
  translator pass (M-TD5).
…sweep

Two fixture bugs surfaced when run-all.sh was actually re-run on top
of the sweep commit; both were introduced by the batch-5 changes
that addressed reviewer M-TD6 / H-T4.

- phase3.sh — the 5th-attempt boundary assertion (M-TD6) had the
  off-by-one in the wrong direction. Rate-limiter Check() runs
  BEFORE the password compare; NoteFailure() runs AFTER. So with
  threshold=5, the 5th attempt arms the lockout but still returns
  401 (the password compare failed and the response was already
  prepared); the FIRST 429 lands on attempt 6 when Check() sees
  the armed lockout. The assertion now asserts 401 on attempt 5
  and 429 on attempt 6, which is what the limiter actually
  guarantees.
- phase4.sh — the new guest-bearer check (H-T4) referenced
  $GUEST_PASS without setting a default. run-all.sh configures
  guestpass via --set-guest-pass, but a standalone phase4.sh
  invocation has no GUEST_PASS in the environment and `set -u`
  bails. Default to `guestpass` to match run-all.sh's daemon
  fixture, and document the env var alongside ADMIN_PASS in the
  script header.
…cOpsImpact)

Seven findings from the second-round review. All 21 ctest binaries
still green; run-all.sh full curl matrix still green (660+ assertions
across 26 phase scripts).

- HttpServer.cpp:153 — set explicit m_parser->header_limit(16 KiB).
  Beast's default header_limit varies across versions (8 KiB in
  current upstream, larger on older). Without an explicit cap, a
  drip-feed attacker who slowly streams header lines grows the
  flat_buffer until the 10 s read timeout catches them — but
  that's 10 s × pool of concurrent peers of memory pressure. The
  explicit cap pins the per-peer ceiling at 16 KiB regardless of
  upstream defaults. (amule-project#3)

- HttpServer.cpp:200-220 — replace the e.what()-in-body 500 with
  a generic "internal server error" message; log e.what() server-
  side via stderr. Information-disclosure defence: a future
  header-driven throw site (or picojson's offset-with-byte error
  messages today) would otherwise reflect caller-supplied bytes
  (Authorization fragments, Cookie content) back to the requester
  via the 500 body. The H-S1 fix in the prior sweep handled JSON
  escaping but kept the disclosure surface; this closes it. (amule-project#2)

- App.cpp TextShell — bind-time hard gate. Refuse to start when
  BindAddress is non-loopback AND both admin/guest passwords are
  empty. The prior WARN-only behaviour meant a daemon started
  against a routable interface BEFORE the operator ran
  --set-admin-pass accepted TCP, refused every login (503
  login_disabled), but still exposed the unauth surface
  (/version) and would silently leak any future read-without-auth
  endpoint. Loopback binds are unaffected — that's the
  documented first-run flow. (amule-project#1)

- HttpServer.cpp + App.cpp shutdown — Listener::Stop now cancels
  every live SSE session's socket via a new Session::RequestCancel
  posted onto the session's strand. The new g_live_streams
  weak_ptr registry tracks every session that holds a streaming
  slot. Without this, a worker blocked inside a synchronous Beast
  write to a slow peer would hold the connection until the kernel
  TCP timeout (~minutes) and CHttpServer::Stop would join the
  io_context thread indefinitely waiting for the workers. Now the
  close fires, the write fails EPIPE, the worker exits cleanly,
  the last shared_ptr ref drops, and shutdown completes in
  bounded time. (amule-project#6)

- App.cpp refresher loop — fail-loud on sustained EC blackout.
  After ~30 s of consecutive failed ticks, log a sharp WARN to
  stderr; after ~5 min, set g_shutdownRequested so a supervisor
  (systemd, launchd, docker restart=always) can bring the pair
  back. Previously the refresher would silently loop forever
  retrying a dead amuled, with the daemon serving 503
  ec_unavailable from every endpoint with no visibility into the
  underlying cause. (amule-project#7)

- EventBus.{h,cpp} + EventDiff.cpp — new
  CEventBus::PublishBatch(vector<pair<name,data>>) takes one
  lock acquisition + one notify_all for the whole batch. DiffMap
  now accumulates removed/added/updated tuples and emits via the
  batch API. The cold-start tick on a 5K-download library was
  previously firing ~5K individual notify_all cycles inside the
  refresher loop (each going through every drainer's cv mutex),
  which dominated the tick wall-clock. Single-publisher
  invariant + monotonic id assignment both preserved (the batch
  assigns inside the same lock as the per-event Publish). (amule-project#9)

- App.cpp OnRun — sigaction with explicit sa_mask instead of
  std::signal. std::signal has implementation-defined
  "reset-to-SIG_DFL after firing" semantics on some platforms
  (notably musl/Alpine and legacy BSDs); a second SIGINT could
  default-terminate the daemon mid-shutdown. sigaction with
  SA_RESTART gives portable stay-installed semantics across
  every relevant target and prevents EINTR on blocking EC reads
  during shutdown. Windows lacks sigaction; std::signal fallback
  retained there. (amule-project#5 — Alpine/musl support)
Two new files covering the complete v0 API surface:

- docs/amuleapi/REFERENCE.md (~950 lines)
  Endpoint catalog: every route under /api/v0/, grouped by resource
  (system, auth, downloads, clients, shared, servers, categories,
  preferences, networks, kad, logs, stats, search). Per endpoint:
  HTTP method(s), auth role (NONE/GUEST/ADMIN), query/path params,
  request body schema, complete curl example, sample success
  response, and the error codes the handler can emit. Sections
  upfront for the shared model: auth carriers (bearer vs cookie),
  role gating, rate limiting, JWT structure, error envelope, ETag
  and conditional GET, CORS surface, path validation, request size
  limits. Closing error-code catalog so clients have one table to
  alert on.

- docs/amuleapi/EVENTS.md (~310 lines)
  SSE contract: connect flow, headers, initial reassurance chunk,
  frame format, channel filter (downloads/shared/servers/clients/
  status/logs), 15 s heartbeat semantics, Last-Event-ID reconnect
  matrix, live-path gap detection, and the synthetic resync event.
  Catalogs every event the bus emits with its payload shape:
  download_*, shared_*, server_*, client_*, status_changed,
  log_appended, plus the filter-bypass resync. Closing notes on
  the single-publisher invariant and shutdown behaviour.

Cross-linked from:
- docs/QUICKSTART-AMULEAPI.md (replaced the "OpenAPI is a follow-up"
  hedge with pointers to the new files).
- docs/man/amuleapi.1.in (the man page promised docs/api/; updated
  to point at docs/amuleapi/REFERENCE.md and docs/amuleapi/EVENTS.md).

No source changes; pure docs. Closes the "where's the API spec"
gap that the man page was advertising against an empty directory.
The man page originally promised docs/api/; restore the directory
name to match. Renames the two reference files in place and updates
the cross-links in QUICKSTART-AMULEAPI.md and the amuleapi(1) man
page. No content changes.
The doc examples carried 176.123.5.89:4725 verbatim, lifted from the
phase11.sh curl regression test where it was used because it's a
known-live ed2k server. That's fine for a smoke test but wrong for
user-facing docs — implies endorsement and links a third-party
operator's address to the project. Swap to 203.0.113.5:4242 (RFC 5737
TEST-NET-3, reserved for documentation) throughout REFERENCE.md and
EVENTS.md and align the port in the example status response so the
two examples stay consistent. No structural changes.
…nt cleanup

Code:
- EventDiff.cpp/h: every ToJson(*Snapshot) rewritten to emit the same
  byte-for-byte shape as the matching REST list-item writer in
  Api.cpp. Every Equal(*Snapshot) expanded to compare every emitted
  field so SSE _updated fires on any movement (the original gap was
  shared_updated never firing on xfer counter movement). New
  ToJsonStatusEvent(Status, Kad, ec_connected) produces the nested
  REST /status envelope; new Equal(KadSnapshot). EmitDiffsAndUpdate
  reads state.Dashboard() so status + kad + ec_connected are diffed
  against the prior tick as one rollup. LastSeenState gains
  KadSnapshot kad + bool ec_connected.

- Api.cpp: HandleNetworksConnect accepts {"network":"ed2k"|"kad"|
  "both"} matching /networks/disconnect; routes to EC_OP_SERVER_CONNECT
  / EC_OP_KAD_START / EC_OP_CONNECT. HandleKadConnect +
  HandleKadDisconnect deleted as strict aliases of the granular form.
  /search/results fires EC_OP_SEARCH_PROGRESS in the same single-flight
  TtlCache miss; response carries progress.{percent, complete} so
  empty results aren't ambiguous between "no search", "in flight",
  "finished with zero hits". Api.h adds SearchCacheValue + new
  TtlPair_Search alias.

Tests:
- phase5e.sh: replace dedicated /kad/{connect,disconnect} assertions
  with /networks/{connect,disconnect} body-selector assertions; add
  network=ed2k and bogus-selector negative cases.
- phase6.sh: assert progress envelope on /search/results.

Docs:
- REFERENCE.md: TOC linking every endpoint section; clear_completed
  cross-references DELETE /downloads/{hash}; kad/bootstrap body
  corrected from a wrong {server_address: ...} to the real {ip,
  port} shape; /networks/connect doc updated for the body selector
  and /kad/{connect,disconnect} removal noted; /search/results doc
  shows progress envelope.
- EVENTS.md: every event sample replaced with the aligned shape;
  cross-resource intro states _added/_updated are byte-identical to
  the REST list-item shape they mirror.

Comment cleanup:
- Removed PLAN.md / PLAN §N / internal-phase-naming references from
  in-code comments per the standing policy. Replaced "deferred to
  v0.2" parentheticals with their removal. Section-divider phase
  comments and pure phase-marker comments deleted. Strip is partial;
  the bulk-apply tool tripped a regex bug; this commit is the
  post-recovery point and a follow-up sweep will finish.

All 21 ctest binaries green; build clean.
@got3nks got3nks force-pushed the feature/amuleapi-v0 branch from d80bdc2 to 7016701 Compare June 20, 2026 17:00
@ngosang

ngosang commented Jun 20, 2026

Copy link
Copy Markdown
Member

Great work. It will take me a few days to test everything. The initial impressions are good, but I have a few questions/concerns that I'll leave here.

The first one is related to the workflow. I don't want to be working on huge pull requests for months. I would like a plan so we can gradually merge PRs without blocking bug-fixing releases. The first PRs can be large, but afterwards, I would like to work with small stabilization and bug-fixing PRs.

Another concern I have is about how the event system works. I haven't looked closely at the source code, but I've seen that you make requests using the EC protocol every 1 second. If this is the case, I don't think we have a good architecture. The most common use case will be a low-power NAS server running the AmuleD and AmuleAPI processes 24/7. There is only 1 client and 99% of the time it is not connected to the web interface. It is a waste of resources for AmuleAPI to be continually synchronizing and serializing information when nobody is consuming it. These processes have to be as efficient as possible to spend the CPU/Memory resources on downloads and uploads. We should rethink this use case. I like events, but servers shouldn't waste resources when no one is connected.

Regarding the events, they seem to be working well, but I think we can improve this. I have doubts about the scalability of the solution. My long-term goal is to have a WebUI with all the functionality of Amule / AmuleGUI. As you already know, I am sharing thousands of files, I have more than 100 concurrent uploads all the time... I don't know if the event system is prepared.

  • With just 3 files downloading (and 0 uploads) I'm receiving many events.
  • Most of the time, the same information is received (for example, the ed2k server doesn't change frequently).
  • There are no events with the search results (I have to do polling).

I'm going to need us to add the functionality of serving static resources in AmuleAPI. I have made an implementation here for my tests. => #220
If it sounds good to you, I would include it in this pull request and for now serve an index.html file indicating "AmuleAPI is running.". The goal is for the infrastructure to be ready when I submit my PRs, so that it can be tested by other users and setting up the development environment becomes easier.

I am testing several web technologies to see which one best fits the project. If you want to take a look, it is here, but I will probably throw everything away and start over. => #220

Bugs

In this PR, I see that the curl test files have non-descriptive names. They should be renamed so that they make sense to other developers.

The GET /shared endpoint is not working correctly. I have 3 downloads in progress and shared returns these results.
{"shared":[{"ecid":11,"hash":"8d6d3546ed1eee79559a6c92f59af514","name":"Linux Ubuntu 7.14 Desktop i386.iso","ed2k_link":"ed2k://|file|Linux%20Ubuntu%207.14%20Desktop%20i386.iso|659753388|8D6D3546ED1EEE79559A6C92F59AF514|/","size":659753388,"priority":"high_auto","complete_sources":0,"xfer":{"session":0,"total":0},"requests":{"session":0,"total":0},"accepts":{"session":0,"total":0}},{"ecid":744,"hash":"00000000000000000000000000000000","name":"","ed2k_link":"","size":0,"priority":"","complete_sources":8,"xfer":{"session":0,"total":0},"requests":{"session":0,"total":0},"accepts":{"session":0,"total":0}},{"ecid":755,"hash":"00000000000000000000000000000000","name":"","ed2k_link":"","size":0,"priority":"","complete_sources":0,"xfer":{"session":0,"total":0},"requests":{"session":0,"total":0},"accepts":{"session":0,"total":0}}]}

The GET /categories endpoint behaves strangely. When there are no categories, it returns an empty list. When there is 1 category, it returns 2 elements (category 0 / all and the category I have added). This makes the API inconsistent. I would always return category 0. We need to verify that category 0 cannot be edited.

The GET /search/results endpoint does not work properly. If I call it right after POST /search (global search) I get {"results":[],"progress":{"percent":100,"complete":true}}. If I call it after a few seconds I get many results, but the progress is at 0 progress { percent: 0, complete: false }

The download "parts" are not being sent. It is not necessary in the first version, but it is what is missing to be on par with amuleweb.

@got3nks

got3nks commented Jun 20, 2026

Copy link
Copy Markdown
Author

Thanks for the detailed look. Quick triage:

Workflow — agreed. Land amuleapi + the bug fixes below in this PR, then small stabilization PRs after.

Idle polling — you're right that the refresher loops at 1 Hz unconditionally (kTargetCycle = 1s at src/webapi/App.cpp:435), no subscriber-count gating. Before we design the fix, can you measure CPU/memory cost on your production node — top -p $(pgrep amuleapi) for a few minutes against your real workload — and share the numbers? If it's the architectural problem you're worried about, we'll see it. Worth noting: pausing the refresher entirely when no clients are connected would create a thundering-herd on reconnect, since amuled would dump the entire delta the moment we resume. Better path is probably reducing the cadence (e.g. 1 Hz → 0.2 Hz) when idle, keeping the state warm enough that reconnect is cheap.

Event volume / "many events" — 3 downloads × 1 update/sec is normal for a live UI, but coalescing per-tick deltas into a single batched event would cut framing overhead substantially. Will look at this with the idle-poll work.

Event volume / "same info" — the diff path has an Equal() whitelist per resource type; if you're seeing repeated server_updated for fields that didn't change, that's likely a missing field in the comparator or genuine churn (e.g. server users count bumping every poll). If you can capture a few seconds of the SSE stream when you see it, I can pinpoint which field is firing.

Search events — will add a search_finished SSE event so you can react to completion instead of polling /search/results blindly.

GET /shared ghost entries — partial clarification first: a download surfaces in /shared only once it has at least one verified chunk (eD2K shares hashed chunks, so a partfile with no completed parts is correctly absent). So "3 downloads → 3 shared entries" isn't quite the right expectation — it's "downloads with ≥1 completed chunk → shared entries". The two ghost entries with zero hash + empty name ARE bugs regardless — likely partfiles whose first chunk just completed but metadata hasn't propagated. Will filter zero-hash entries unconditionally. Could you send the matching amulecmd status (or equivalent state snapshot) at the moment of repro so I can confirm I'm fixing the right root cause?

GET /categories — quick clarification: amuled EC itself returns nothing when no custom categories exist, and includes category 0 once you add one. So we're faithfully mirroring EC's behaviour, not introducing the inconsistency. That said, normalizing at the API layer to always include category 0 is the right call for client predictability — will add.

GET /search/results — real bug, investigating. The progress.complete=true on an empty fresh result looks like a sticky flag from a previous search.

Download parts — they're shipped, but only on the detail endpoint (GET /downloads/{hash}) — see Api.cpp:1697 (include_parts=true). The list endpoint deliberately omits them because a multi-TiB file's parts array can be 100K+ entries. Will sharpen the REFERENCE.md note so it's clearer.

Curl test file names — true, the phase*.sh naming is a development artifact. Will rename to describe what each phase actually covers.

Static-serve infrastructure — agreed, will fold it into this PR. Taking the implementation from #220 with three hardening fixes: symlink containment (resolve realpath, reject if it escapes static_root), file-size cap (16 MiB), and ETag/If-None-Match so subsequent loads hit 304 instead of full re-read. Default placeholder index.html saying "amuleapi is running" so it's a no-op for API-only deployments and gives your frontend a stable hook.

…Snapshot

amuled's downloads (CPartFile in downloadqueue) and shared (CKnownFile
in sharedfiles + partfiles with PARTFILE_SHARED=true) views share a
single per-process ECID counter (CECID::s_IDCounter) and a unified
encoder-side CObjTagMap. Mirroring that on the receiver side avoids
the class of bugs we kept patching:

  - The /shared "ghost" rows we fixed in e57cbf9 were caused by the
    shared walker hitting an ECID that the downloads walker had
    already populated in a SEPARATE m_shared cache, with the
    PARTFILE_HASH child tag CValueMap-suppressed (already sent on
    a prior partfile-walker tick). We worked around it by composing
    a dl_identity_fallback map and passing it to the shared walker.

  - With one ECID-keyed FileSnapshot map, the shared walker finds
    the entry the downloads walker just touched, with hash + name
    already set. It just flips is_shared=true and merges its own
    fields. The fallback compose disappears.

Schema:

  struct FileSnapshot {
      uint32_t ecid;
      string   hash, name, ed2k_link;
      uint64_t size;
      string   priority;
      bool     is_downloading, is_shared;
      struct DownloadSide { ... } download;  // partfile-walker fields
      struct SharedSide   { ... } shared;    // knownfile-walker fields
  };

Role transitions:
  - Downloads walker: on PARTFILE entry, sets is_downloading=true and
    merges f.download. On FILE_REMOVED, clears is_downloading + resets
    download to default; drops the entry entirely if !is_shared.
  - Shared walker: same shape for is_shared/f.shared. PARTFILE_SHARED=
    false clears is_shared and resets the shared sub-block; the entry
    stays in the unified map (entity eviction is FILE_REMOVED's job).
  - Order-independent: running downloads-then-shared and shared-then-
    downloads converge to the same final state.

Reset-on-transition guarantees /downloads can never serve stale upload
stats, and /shared can never serve stale download stats — same lookup
hitting either view always reads from a freshly-walked sub-block.

API surface unchanged: GET /downloads filters m_files by is_downloading;
GET /shared filters by is_shared. Same wire shape; clients see no
difference.

Bonus from the unified map: FindDownload(hash) / FindShared(hash) now
hit a parallel std::unordered_map<hash, ECID> index (m_hash_to_ecid)
that's rebuilt at the end of every MutateDownloads / MutateShared.
Was O(N) linear scan; now O(1) hash lookup + O(log N) map find +
role-flag check. REFERENCE.md updated.

RefresherTest migrated to FileSnapshot + role-flag assertions
(32 cases, all pass). EventDiff diffs the unified files map with
role-flag awareness: download_{added,updated,removed} fires on
is_downloading transitions, shared_{added,updated,removed} on
is_shared transitions. A single tick can emit both for one ECID
(e.g. partfile becoming shared).

Full curl suite green: 27/27.

Refs amule-project#201 review.
@got3nks got3nks force-pushed the feature/amuleapi-v0 branch from 173c5f9 to 03700a6 Compare June 21, 2026 14:14
@got3nks

got3nks commented Jun 21, 2026

Copy link
Copy Markdown
Author

Five commits since the last update:

  • /categories always includes index 0 (e187b72): amuled's EC suppresses the whole EC_TAG_PREFS_CATEGORIES block when no custom categories exist, even though category 0 (the default incoming dir) is always present in amuled's data structures (GetCatCount() >= 1 always). Normalised at the API layer so clients see the same shape regardless of category count.
  • Curl-test rename (cd94269): the phase*.sh naming was a development artifact. The 27 scripts now live as NN-<what-it-covers>.sh — numeric prefix preserves the canonical dependency ordering in run-all.sh (auth before mutations, refresher consolidation before later read smokes, CORS / static-frontend last). ls reads as a feature inventory instead of a commit log.
  • Refresher-driven search lifecycle + SSE events (9dc4f36): GET /search/results progress was unreliable — amuled's EC_TAG_SEARCH_STATUS overloads raw=0 ("no search" / "Kad in progress" / "global queue not yet populated" / "global finished, m_searchInProgress just flipped off") and raw=100 ("global queue empty at start" / "global all done briefly before reset"). The lifecycle now lives in the refresher: POST /search marks active, RefresherTick polls and runs a pure state machine (AdvanceSearchProgress) that disambiguates via kind, prior saw_in_progress, results-count, and a 30 s defensive timeout. The timeout is gated by !saw_in_progress, so a healthy global search (which always climbs through raw ∈ [1..99]) excludes it the moment we observe the first non-zero raw — the timeout only catches "no climb, no results" edge cases. Kad's raw stays at 0 for the whole search (only 0xfffe flips it), so saw_in_progress never trips; we bump started_at on each new result instead, which distinguishes a productive Kad search from a stuck one. GET /search/results reads straight from state — no TtlCache. Two new SSE events: search_result_added per new ECID, search_finished on the complete=false→true edge. RefresherTest grows 12 cases covering every state-machine branch.
  • Declare the search SSE channel (2c85f7d): the prefix-based event_channel fallback already routed search_* to channel "search" in practice, but the mapping is now explicit so a future event family doesn't drift via the silent fallback. EVENTS.md gains the channel row + per-event payload docs; 26-rfc-followup-endpoints.sh adds positive (?channels=search delivers search_finished) and exclusion checks.
  • Unify downloads + shared into one ECID-keyed FileSnapshot (03700a6): bigger structural refactor — m_downloads + m_shared become a single m_files keyed by ECID, with is_downloading / is_shared role flags + sub-blocks for role-specific fields (reset on the true→false transition so /downloads can never serve stale upload stats, and /shared can never serve stale download stats). Both walkers operate on the same unified map; the shared walker no longer needs the dl_identity_fallback workaround introduced in e57cbf9 — identity is already in the entry from whichever walker saw it first. Same wire shape, no client breakage. As a bonus, FindDownload(hash) / FindShared(hash) get a parallel m_hash_to_ecid index that's rebuilt at each Mutate*, so hash lookups are O(1) instead of a linear scan over the map. Mirrors amulegui's CKnownFilesRem::m_items_hash design (amule-remote-gui.cpp:1507). RefresherTest + StateTest migrated to the role-flag model — 22 unit-test suites all pass.

Packaging fired on the fork: https://github.com/got3nks/amule/actions/runs/27907062647


Performance numbers, addressing the earlier concerns

Ran CPU + bandwidth measurements on a production node (busy uploader at ~33 MB/s) to put real numbers on the perf discussion:

Metric Idle (no subscriber) 1 SSE subscriber, full channels
amuleapi CPU 0.20% 0.27%
amuled CPU 62.4% 59.2%
SSE bytes/min 0 2,506 KiB
Events/min 0 4,254

Reading those:

  • Idle CPU is negligible — 0.20% is the cost of the 1 s EC poll + diff walk with zero subscribers. The diff still runs (single-publisher invariant requires it), but the EventBus's PublishBatch is a no-op when nobody's draining. On a low-power NAS that's well under any "wasted resource" threshold IMO, but it's worth measuring on actual ARM hardware to confirm.
  • Per-subscriber CPU is essentially free — +0.07% for handling 4,254 events/min. The serializer + ring-buffer Drain are doing their job.
  • Bandwidth is the only material cost — and only with an active subscriber. ~42 KiB/s on a very busy node (33 MB/s up + many peer-stats churning); on idle / typical nodes this falls dramatically. The client_updated channel is ~70% of the wire traffic in this sample (peer-stats churn for every active upload).

For the cellular-bandwidth concern, the cheap mitigation is HTTP Content-Encoding: gzip on the SSE response. Standard HTTP, transparent to browser EventSource / curl / SDK clients, no wire-format change. Given how repetitive the SSE JSON is (the same key names show up thousands of times), I'd expect 10–15% of original on the wire — roughly 4–6 KiB/s in the worst-case sample above, well into cellular-tolerable territory. Bandwidth is the real cost we should address; deferring to a follow-up issue once this PR and #220 land so we don't expand scope here. Compact / diff-only event formats are a bigger lift (SDK changes for every consumer) and probably never necessary if gzip alone is sufficient.

For the "disconnect amuled state baseline after N min of idle" idea — CPU doesn't look like the bottleneck on this sample (0.20% idle, +0.07% per active subscriber), but I'd want measurements on constrained hardware (low-end ARM NAS) and on a huge library before making a call. A lighter intermediate is to stretch RefresherTick to 5 s while no SSE subscriber is connected instead of pausing entirely — keeps state warm enough that the first reconnecting subscriber sees something close to current, while dropping the per-second EC roundtrip cost to ⅕. That's also a follow-up.

For comparison, amulegui uses the same per-second polling pattern while open (amule-remote-gui.cpp:220, 234, 627 all fire DoRequery(EC_OP_GET_UPDATE, EC_TAG_KNOWNFILE) on a wxTimer tick) — but because amulegui is a user-launched GUI process, the "what happens when nobody's looking" question doesn't exist there: the process closes. amuleapi is a 24/7 daemon, so we own that question.


@ngosang — I think we're at a point where this can merge into master once you've verified the fixes + features land cleanly on your side. It's strictly additive — no amuleweb legacy paths touched, the existing REST/SSE wire shape is unchanged, and the bandwidth / idle-tick-rate follow-ups are tracked separately so they don't gate the merge.

@ngosang

ngosang commented Jun 22, 2026

Copy link
Copy Markdown
Member

I have rebased my PR on top of this one and it works fine.

  • The web server for static files, it seems to be working fine. I have removed the web server I had implemented in my PR and I am using yours.
  • The issue with the "shared" files seems to be resolved. I have tried downloading more files and everything seems to be fine. The files appear in shared when the first chunk is downloaded.
  • The issue with categories is also resolved. When I only have category 0 (all) it is returned; if I have more, all of them are returned. It works as expected.
  • The search issue seems to be resolved as well. I have done several tests with the Global, Local, and Kad search types, and they all seem to work fine. On the frontend, I have implemented polling until the process finishes. Everything is working well.

Just a couple of comments.

I see that some endpoints accept the md4 hash and the ecid as parameters. IMHO, I think the ecid is something internal and should not be exposed in the public API. I think we should work only with the MD4 hashes of the downloads. There are several reasons not to expose the ecid:

  • First of all, I think we should make this change to simplify the API.
  • On the other hand, to avoid exposing an internal identifier, because I think that this internal identifier might stop working if the backend to amuled or amuleapi is restarted; those identifiers are no longer valid.
  • Lastly, because perhaps in the future we won't use the EC protocol and the connection will be directly with amuled.

For all these reasons, I think it would be better to work only with md4 hashes from the beginning and do the conversion in the backend between the hash and the ecid when necessary. I would not accept it as an identifier in the URLs nor would I return it in the body of the responses and events. I would always work with the md4 hash.

Regarding this PR, you can merge it whenever you consider appropriate. As for me, we can continue working without merging. I don't want to merge my PR until I have something more or less stable. I still need to evaluate the best solution to make the design responsive on smartphones, for internationalization, and whether to use a library to render the charts. It's going to take me quite some time, several weeks, to have something I am satisfied enough with to merge. I will surely be asking you for some changes to the APIs. This is a feature meant for the next release, therefore I am not giving it top priority. I am working on other projects at the same time.

@got3nks

got3nks commented Jun 22, 2026

Copy link
Copy Markdown
Author

Good point. The ECID exposure was initially load-bearing because ClientSnapshot.upload_file_ecid / download_file_ecid (the correlator fields linking a peer to the file they're uploading from / downloading) surfaced ECIDs as the only key — so a client watching client_updated events couldn't resolve the file without either (a) maintaining its own ECID→hash cache from prior /downloads polls, or (b) calling /downloads/{ecid}. That's why we accepted ECID in URLs and emitted it in event payloads.

After the unified FileSnapshot refactor in 03700a6, the ECID→hash resolution is essentially free on the server side — the unified m_files map is ECID-keyed, so m_files[ecid].hash is one std::map lookup. We can do the conversion at serialization time and never expose the ECID at all. Will fold the ECID-strip into this PR before merge:

  • /downloads/{key} and /shared/{key}: hash-only (drop the TryParseEcid disjunction)
  • Item bodies in /downloads, /shared, /search/results: strip ecid
  • SSE download_* / shared_* / search_result_added payloads: strip ecid
  • ClientSnapshot.upload_file_ecid / download_file_ecidupload_file_hash / download_file_hash, resolved server-side at serialization time
  • REFERENCE.md / EVENTS.md, curl test assertions, and RefresherTest cases updated to match

@ngosang

ngosang commented Jun 22, 2026

Copy link
Copy Markdown
Member

/downloads/{key}

Instead of calling it "key" or "ecid" or "id" could you call it "hash" or something similar everywhere? I think it's more clear.

got3nks added 3 commits June 22, 2026 20:13
Hashes (MD4, 32-char hex) are the only identifier the API exposes for
partfiles and shared files. File ECIDs were never wire-public:

- /downloads/{key} → /downloads/{hash} (PATCH/DELETE/detail)
- /shared/{key}    → /shared/{hash}
- removed "ecid" from /downloads, /shared, search-result event payloads
- /clients: upload_file_ecid / download_file_ecid → upload_file_hash /
  download_file_hash. The walker now resolves EC_TAG_CLIENT_UPLOAD_FILE
  and EC_TAG_CLIENT_REQUEST_FILE into MD4 hashes at refresher-tick time
  using an ECID→hash snapshot built from the unified file map.

Client and server ECIDs stay — they are still the URL key for
/clients/{ecid} and /servers/{ecid}, and there's no hash for either.

Tests updated: curl phases 04 + 17 drop the file-ECID assertions and
the PATCH /shared/{ECID} round-trip. All 22 cpp unit tests still pass;
all 27 curl phases still pass against the local amuled.

Closes the file-ECID strip discussed in amule-project#201.
Disambiguates from the file ECIDs we just stripped — `ecid` alone made
it look like a file identifier, when it's actually the peer's identity.

Affected surfaces:
- GET /clients[] payload
- client_added / client_updated SSE payloads
- client_removed SSE payload (`{ "client_ecid": N }`)

server_ecid stays as plain `ecid` — same field name as before, still the
URL key for /servers/{ecid}.

Curl phases 10 + 26 updated. All 22 cpp unit tests + 4 affected curl
phases (10, 22, 24, 26) pass.
CState's unified file map was std::map<uint32_t, FileSnapshot> with a
parallel std::unordered_map<string, uint32_t> hash→ECID index that got
rebuilt from scratch on every MutateDownloads / MutateShared call (one
rebuild per refresher tick). That O(N) per-tick scan was the dominant
cost on large libraries — 100k files × 2 walkers × 1 Hz = 200k iterations
per second spent re-deriving an index that almost never changed.

Replace both with a single FileMap wrapper class that owns both data
structures:
  * std::unordered_map<uint32_t, FileSnapshot> for ECID-keyed storage
  * std::unordered_map<string,   uint32_t>     for hash → ECID lookups

emplace() and erase() update the index inline (O(1) avg per change);
hash never mutates within an ECID so no other surface needs hooks.
Walkers keep their .find()/.emplace()/.erase() calls unchanged — the
wrapper exposes the same iterator surface. Per-tick rebuild deleted.

Lookups now both O(1) avg:
  * ECID  → FileSnapshot : FileMap::find(ecid)
  * hash  → FileSnapshot : FindEcidByHash(hash) + find(ecid)

CState::Downloads() / Shared() / Files() vectors are no longer in
ECID-ascending order (unordered_map iteration is bucket-defined).
Consumers already sort on their side (ngosang's feedback on PR amule-project#201),
so this is wire-visible but harmless.

REFERENCE.md tidied — the previous /downloads/{key} prose was left over
from the file-ECID-strip pass; consistent {hash} naming throughout now,
no internal-state details leaking onto the public API surface.

22/22 cpp unit tests + 27/27 curl phases pass.
@got3nks

got3nks commented Jun 22, 2026

Copy link
Copy Markdown
Author

@ngosang — the three follow-up commits just landed:

  • 1dd35a67 — dropped file ECIDs from the public API surface (/downloads/{hash}, /shared/{hash}, client.upload_file_hash / download_file_hash)
  • 3dc6370f — renamed clients[].ecidclient_ecid (disambiguates it from the file ECIDs that no longer exist on the wire). Heads-up: this changes the wire shape for GET /clients, the client_added / client_updated SSE payloads, and the client_removed payload ({ "client_ecid": N } now). Any front-end path that consumed the old ecid field — REST or SSE — will need a one-line update.
  • 898818a5 — switched the internal file cache to unordered_map with an inline hash → ECID index; both find by hash and find by ECID are now O(1) avg, no per-tick rebuild

All path placeholders in docs/api/REFERENCE.md are consistently {hash} now (no more {key} / "hash OR ECID").

Worth rebasing the front-end PR #220 on top — these are architectural changes and the wire format moved (file ECIDs disappeared from /downloads, /shared, client.*_file_*; clients[].ecid is now client_ecid), so a fresh integration pass will surface any consumer paths that still expect the old shape.

Plan from this side: leave PR #201 in draft (no merge) until 3.0.1 ships, which buys breathing room for the front-end testing pass to surface any other bugs / feature requests on amuleapi. Probably worth promoting the API to v1 before the merge, too — the spec is at the point where the v0 label is misleading.

@got3nks got3nks force-pushed the feature/amuleapi-v0 branch 2 times, most recently from 8025917 to 7a9ae82 Compare June 22, 2026 19:13
EVENTS.md never spelled out the call ordering for the snapshot + stream
combination. The naïve REST-first → SSE-second pattern silently drops
events that fire between the two — a one-line bootstrap bug that's
easy to write and hard to spot in testing.

New "Bootstrap: snapshot + stream" section explicitly:
- Names the race (REST→SSE loses events fired in the gap)
- Documents the correct order (open SSE first, then GET, apply events
  as upserts on top of the snapshot)
- Explains why the merge is consistent (events for tick N are emitted
  after the state mutation for tick N is committed)
- Gives a JS bootstrap example with parallel snapshot fetches
- Points at the resync recovery rules for the failure modes
@got3nks got3nks force-pushed the feature/amuleapi-v0 branch from 7a9ae82 to 6f32a7a Compare June 22, 2026 19:19
got3nks added 3 commits June 22, 2026 21:40
Bootstrap-companion gaps the doc didn't cover, found while sanity-checking
the bootstrap example against real-world client paths:

- Connecting / Browser example: add an `error` listener with the
  `readyState === CLOSED` terminal-failure check, and clarify that the
  native browser EventSource API can't carry an `Authorization` header
  so browser bearer-on-SSE requires a polyfill.
- Channels and filtering: note that the snapshot fetches in the bootstrap
  must mirror the `?channels=` filter; pulling unsubscribed collections
  leaves them silently stale.
- Event catalog intro: call out that `status_changed` is a full envelope
  replacement and `log_appended` is an append — they don't fit the
  `_added` / `_updated` / `_removed` upsert pattern.
- Search / Logs channel sections: explicit pointer back to Bootstrap
  noting that `/search/results` and `/logs/amule` aren't fetched there
  by default, with the case for adding them.

Plus a tidy-up: every `§Section` cross-ref now uses a clickable markdown
anchor link instead of the section-symbol convention (matches the
REFERENCE.md links and renders properly on GitHub).
The existing EC_TAG_SEARCH_STATUS overloads a single uint32 to mean a
lot of different things — raw=0 covers "no search", "Kad in progress",
"global queue not yet populated", and "global just finished"; raw=100
covers "global queue empty at start" and "global all done briefly".
0xfffe = Kad done, 0xffff = global done (or local in progress). The
desktop GUI and amulecmd live with this because their UI state +
explicit notifies cover the gaps; for headless consumers driving a
JSON wire surface (amuleapi) it forces a defensive state machine
(saw_in_progress tracking, 30 s timeout, started_at bump-on-result).

Add three new optional tags on the EC_OP_SEARCH_PROGRESS response so
modern consumers can read the lifecycle directly without the sentinel
decode, while pre-3.1 consumers (amulegui, amulecmd, amuleweb) keep
reading EC_TAG_SEARCH_STATUS unchanged:

  EC_TAG_SEARCH_LIFECYCLE_STATE  uint8   0=idle / 1=running / 2=finished
  EC_TAG_SEARCH_LIFECYCLE_KIND   uint8   0=local / 1=global / 2=kad
  EC_TAG_SEARCH_RESULT_COUNT     uint32  size of m_results[m_currentSearch]

Derivation in CSearchList::GetSearchLifecycleState():
- m_currentSearch == -1               → IDLE  (no search this session)
- m_searchType == KadSearch           → m_KadSearchFinished ? FINISHED : RUNNING
- otherwise (Local / Global)          → m_searchInProgress  ? RUNNING  : FINISHED

EC_TAG_SEARCH_STATUS is unchanged. amulegui / amulecmd / amuleweb iterate
tags they recognise and skip unknowns — verified by a full build of all
four binaries (amule, amuled, amulegui, amulecmd) against the same EC
core.

No per-searchID tracking yet — Get_EC_Response_Search still pins the
sentinel 0xffffffff. Lifting that is a separate (larger) change.
amuleapi pins a daemon version carrying the new EC_TAG_SEARCH_LIFECYCLE_*
tags landed in the previous commit, so there's no reason to keep the
sentinel-disambiguation state machine around as a fallback. Replace it
with a single ten-line mapping from (lifecycle_state, raw_pct) to
(active, complete, percent).

Refresher.cpp/h: AdvanceSearchProgressFromLifecycle becomes the only
AdvanceSearchProgress. The legacy state machine, its 60-line comment
block, and the defensive 30 s timeout / saw_in_progress / Kad-deadline-
bump-on-result tricks are all gone.

RefresherTick.cpp: drops the two-path branch that picked between legacy
and lifecycle. Reads both EC_TAG_SEARCH_STATUS (for the global percent)
and EC_TAG_SEARCH_LIFECYCLE_STATE every tick; passes both to the helper.

State.h: SearchProgressSnapshot loses saw_in_progress, raw, started_at,
last_results_count — all of which only existed to drive the old
disambiguation. Keeps active / kind / percent / complete (the four
fields surfaced by GET /search/results and the search_finished SSE).

State.cpp: MarkSearchStarted no longer stamps started_at — that field
no longer exists.

RefresherTest.cpp: 12 legacy tests covering every branch of the
sentinel-decode machine drop to 5 trivial tests of the new mapping
(running with global percent, running masks percent for Kad / local,
finished sets complete, idle defensive zero).

Net diff: -201 lines, same surface contract on /search/results and
search_finished, and zero magic-number sentinel decoding on the
amuleapi side.

22/22 cpp unit tests + 27/27 curl phases pass against the patched daemon.
@got3nks got3nks changed the title Standalone amuleapi REST API + SSE daemon (supersedes #132) amuleapi REST API + SSE daemon + EC search-lifecycle additions Jun 22, 2026
@got3nks

got3nks commented Jun 22, 2026

Copy link
Copy Markdown
Author

Two more commits landed on the branch:

  • ef0e40d6feat(ec): unambiguous search lifecycle tags on EC_OP_SEARCH_PROGRESS. Three additive tags (EC_TAG_SEARCH_LIFECYCLE_STATE, _LIFECYCLE_KIND, _RESULT_COUNT) so consumers can read the lifecycle directly instead of disambiguating the overloaded EC_TAG_SEARCH_STATUS sentinels (raw=0 covering "no search" / "Kad in progress" / "global queue populating" / "global just finished", raw=100 covering "queue empty at start" / "all done briefly", etc.). EC_TAG_SEARCH_STATUS is unchanged on the wire — amulegui / amulecmd / amuleweb keep working without a single line of change. Full build of amule + amuled + amulegui + amulecmd is green.
  • 15a02b33refactor(amuleapi): consume EC_TAG_SEARCH_LIFECYCLE_STATE directly. Drops the sentinel-disambiguation state machine in Refresher.cpp entirely (the saw_in_progress tracking, the 30 s defensive timeout, the Kad-deadline-bump-on-result trick). Net -201 LOC, -7 legacy tests, +5 trivial tests of the new mapping.

Heads-up for testing: since amuleapi pins the new tags as required, this branch's amuleapi daemon must talk to the patched amuled from the same branch. Stock 3.0.0 or master amuled won't carry the lifecycle tags and amuleapi will sit there with active=true forever.

Cross-platform builds for this branch (amuled + amulegui + amuleapi + amulecmd + amuleweb, plus AppImage / Flatpak / macOS Universal2 / Windows) are running here — artifacts will be downloadable when the run finishes:

https://github.com/got3nks/amule/actions/runs/27984668094

@ngosang

ngosang commented Jun 22, 2026

Copy link
Copy Markdown
Member

I updated my PR and I found some errors in the documentation + 1 outdated comment in the code => e134f06

1. Search progress event

The amuleapi SSE search channel currently emits search_result_added and search_finished, but nothing in between carrying the in-progress percent. The only place a client can read progress.percent mid-search is GET /api/v0/search/results (there is no separate /search/progress endpoint — progress is bundled into the results response). As a result, a web client that wants to show a live "Searching… NN%" indicator is forced to keep polling /search/results on a timer, which defeats the purpose of the event channel (the rest of the UI is snapshot-on-mount + stream, no periodic polling).

Request

Expose search progress over SSE so clients can drive the whole search lifecycle from events and drop the periodic GET /search/results poll. Two viable shapes (either works; a dedicated event is cleaner):

  • Preferred — a new search_progress event on the search channel, fired when progress.percent (or active) changes between refresher ticks, mirroring status_changed:
    { "percent": 67, "active": true, "kind": "global", "results": 142 }
  • Alternative — add percent to the search_result_added payload, so each new result also carries the current progress. Less clean (couples progress to result arrival; no signal when the percent moves but no new result lands).

Why this is cheap to add

The daemon already computes the normalized progress every refresher tick (src/webapi/Refresher.cpp::AdvanceSearchProgressSearchProgressSnapshot {active, kind, percent, complete}), and src/webapi/EventDiff.cpp already publishes search_result_added / search_finished from that same diff pass. A search_progress diff event would slot in right next to them — emit it when the snapshot's percent/active differs from the previous tick.

2. For KAD search

  • Global searches have a meaningful incrementing percent; Kad searches only report 0 → 100.

It's possible to estimate the % based on time? For example, it there is a timeout we can update the progress +10% when time pass.

@got3nks

got3nks commented Jun 22, 2026

Copy link
Copy Markdown
Author

Forgot to port the REST API updates over to the SSE events, will do tomorrow.

got3nks added 3 commits June 23, 2026 12:01
EC_OP_SEARCH_PROGRESS gains a fourth lifecycle tag,
EC_TAG_SEARCH_LIFECYCLE_PERCENT (0x070D): a unified 0..100 completion the
daemon computes for every search kind. Global reuses the real
server-queue percent; Kad -- which has no measurable mid-flight progress
-- gets a cosmetic time-ramp off the fixed SEARCHKEYWORD_LIFETIME (45 s),
capped at 99 until the authoritative finished lifecycle state snaps it to
100 (an early finish on SEARCHKEYWORD_TOTAL snaps it too); local is
instantaneous.

CSearchList stamps m_searchStart in StartNewSearch and derives the value
in GetSearchLifecyclePercent(). EC_TAG_SEARCH_STATUS is unchanged on the
wire, so amulegui / amulecmd / amuleweb keep working untouched -- only
modern consumers (amuleapi) read the new tag.
…es search_finished)

Port search status to the event channel. EventDiff now emits a
search_progress event on every percent change while a search runs and on
the running->finished edge; the terminal frame carries state="finished",
so the old standalone search_finished event is retired (it was just the
completion case). Payload: {state, percent, results, kind}.

The refresher reads the daemon's new EC_TAG_SEARCH_LIFECYCLE_PERCENT
directly, so AdvanceSearchProgress drops its kind-special-casing and the
old EC_TAG_SEARCH_STATUS percent decode -- the daemon already computes a
clean 0..100 for every kind (Kad ramp included). GET /search/results'
progress object is brought to parity with the event: state/kind/percent,
with the canonical state string replacing the redundant complete/active
booleans.

Tests: RefresherTest gains pass-through/clamp cases for the unified
percent; curl 19-search adds a live Kad-ramp assertion (percent climbs,
capped at 99 while running); 22/26 assert search_progress in place of
search_finished.
…drift

Document the new search_progress event and the REST progress object
(state/kind/percent), folding in the retired search_finished.

Also correct documentation drift confirmed against the actual JSON
writers (several flagged in amule-project#220): /status gains kad.network; /clients
gains the full peer field set (software_version, os_info, ident_state,
xfer, queue/score/obfuscation/friend_slot); /servers gains
description/version/max_users/ping_ms/failed; /kad gains
firewalled_udp/in_lan_mode/ip/indexed/buddy; /logs/amule is
{lines,total_cached,returned}; /logs/serverinfo is a {text,*_bytes} blob;
/stats/tree is {nodes:[{label,children}]}; /stats/graphs is rich point
objects with a width param; /search/results entries use
sources:{total,complete}+already_have+rating. Drops the stale per-GET
search TTL-cache note (search is refresher-driven now) and the
log_appended {buffer} field / phantom search_result_added ecid.
@got3nks got3nks force-pushed the feature/amuleapi-v0 branch from 549a1c6 to d87b0ea Compare June 23, 2026 10:09
@got3nks

got3nks commented Jun 23, 2026

Copy link
Copy Markdown
Author

Follow-up commits on feature/amuleapi-v0 porting search status to the event channel:

  • c866b65c3 feat(ec) — adds EC_TAG_SEARCH_LIFECYCLE_PERCENT to EC_OP_SEARCH_PROGRESS: a daemon-computed 0–100 for every search kind (global = real server-queue %, Kad = a cosmetic time-ramp off the fixed 45 s keyword lifetime, finished → 100). EC_TAG_SEARCH_STATUS is unchanged on the wire, so amulegui/amulecmd/amuleweb are untouched.
  • 84afaa2b6 feat(amuleapi) — new search_progress SSE event (running frames + a terminal state:"finished" frame); it supersedes and removes search_finished. GET /search/results's progress object now mirrors the event field-for-field (state/kind/percent).
  • d87b0eae8 docs — documents the above and fixes REST/SSE doc drift verified against the JSON writers: kad.network on /status, the full /clients + /servers field sets, /kad detail fields, the /logs/amule + /logs/serverinfo shapes, /stats/tree + /stats/graphs, /search/results entries, plus the log_appended {buffer} and the phantom search_result_added ecid.

⚠️ Breaking for API consumers:

  • The search_finished SSE event is removed — subscribe to search_progress instead and treat the terminal state:"finished" frame as completion.
  • GET /search/results's progress object drops the complete and active booleans in favor of the canonical state string (running / finished / idle); derive complete = state == "finished", active = state == "running".

Because the EC tag is a wire change, this needs a freshly built amule/amuled — prebuilt binaries (appimage/flatpak/macos/windows) land on the packaging run here once it finishes: https://github.com/got3nks/amule/actions/runs/28018726308

@ngosang — heads-up that your frontend in #220 will need a rebase onto this; and since d87b0eae8 already carries verified-against-code EVENTS.md and REFERENCE.md, you can drop those two files from #220 on the rebase.

…ct progress doc claim

search_result_added now emits `sources:{total,complete}` byte-for-byte
identical to WriteSearchObject (the /search/results[] entry), like every
other collection event mirrors its REST writer. It previously flattened
to sibling `sources` + `complete_sources` integers -- the lone event
that diverged from its REST shape.

Also corrects the overclaim that the REST /search/results `progress`
object mirrors the search_progress SSE event 'field-for-field': the
event additionally carries a `results` count (it has no results array
beside it). Fixes the matching stale comment in Api.cpp plus the
EVENTS.md / 19-search.sh references to the removed progress.complete
output boolean (state is canonical now).
@got3nks

got3nks commented Jun 23, 2026

Copy link
Copy Markdown
Author

@ngosang — one more ⚠️ Breaking API change on the branch since the summary above: 9e7f98e35 alters the search_result_added event payload your frontend parses.

sources is now a nested {total, complete} object — byte-for-byte identical to a /search/results array entry — instead of the previous flat sources + complete_sources integers:

{ "hash": "", "name": "", "size": …, "sources": { "total": 12, "complete": 7 }, "already_have": false, "rating": 0 }

So a single result-rendering function can handle both the GET /search/results array entries and search_result_added events with no special-casing.

Refreshed binaries (built off 9e7f98e35) are on the new packaging run: https://github.com/got3nks/amule/actions/runs/28032209221

@got3nks got3nks added this to the 3.1.0 milestone Jun 23, 2026
@got3nks got3nks marked this pull request as ready for review June 24, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants