amuleapi REST API + SSE daemon + EC search-lifecycle additions#201
amuleapi REST API + SSE daemon + EC search-lifecycle additions#201got3nks wants to merge 35 commits into
Conversation
c75633b to
d80bdc2
Compare
Adds a new daemon `amuleapi` that connects to amuled via EC and serves a versioned JSON REST API plus a long-lived Server-Sent Events stream over HTTP. Built on Boost.Beast; runs as its own process so it can coexist with amuleweb against the same amuled instance on independent ports (amuleweb 4711, amuleapi 4713). Wire-level guarantees: - JWT-based auth with admin / guest roles; session-cookie fallback for browser clients - Per-route rate limiting + IP-based login lockout - Conditional GET on every read (ETag + If-None-Match -> 304) - Live event stream with `Last-Event-ID` reconnect replay and a typed `resync` event for cache invalidation (gap detection + cross-restart ID reset) - Refresher polls EC every 1 s and emits typed diff events: download_added / _updated / _removed and the same shapes for shared files, server list entries, peer clients; status_changed for connection / kad / speed rollups; log_appended for amuled log tail - Optional CORS allowlist via amuleapi.conf[Server]/CorsOriginAllowlist (off by default; empty list = echo any origin in a cookie-compatible way; non-empty = strict per-origin allowlist) - Lazy-fetch TTL coalescing for stats / search / serverinfo so multiple concurrent reads share one EC roundtrip Concurrent REST clients share one EC connection to amuled via a process-wide mutex on every SendRecvSerialized; amuled's single-reader CRemoteConnect requirement stays intact under load. API contract lives under docs/api/ (per-endpoint fragments). First-run setup: docs/QUICKSTART-AMULEAPI.md. English man page at docs/man/amuleapi.1.in. New shared static lib libwebcommon collects the helpers amuleweb and amuleapi both need (ETag, path patterns, JSON writer) so neither code path re-implements them. Tests: - 10 muleunit C++ suites: AmuleApiConfigTest, AuthTest, EtagTest, EventBusTest, EventDiffTest, JsonWriterTest, JwtTest, PathPatternsTest, RefresherTest, StateTest - 18 curl-driven HTTP smokes under unittests/curl-tests/amuleapi/ exercising every endpoint end-to-end against a live amuled, plus a run-all.sh orchestrator that brings up a fresh daemon per phase Packaging: amuleapi rides every existing recipe (macOS .app via the build.sh bin_paths array, AppImage via EXTRA_BINS + argv[0]-dispatch AppRun, Flatpak local + Flathub manifests, Windows NSIS via the recursive File /r). docs/INSTALL.md gains a BUILD_AMULEAPI row. Disabled by default in cmake; enable with -DBUILD_AMULEAPI=YES or -DBUILD_EVERYTHING=YES. The CI build matrix (ccpp.yml) now sets BUILD_AMULEAPI=YES so every push exercises the new compile path.
…review Adds the endpoints the PR amule-project#132 review pass flagged as missing or shape-mismatched against the agreed contract, plus three operator ergonomics tweaks: Missing endpoints now shipped: - POST /shared/reload — rescan share roots (EC_OP_SHAREDFILES_RELOAD) - POST /servers/update — refresh server list from a `server.met` URL - POST /servers/<ip>:<port>/connect — address-keyed alias for the existing ECID form; resolves <ip>:<port> against the cache and delegates to the ECID handler - DELETE /servers/<ip>:<port> — same alias for the removal path - DELETE /logs/amule — EC_OP_RESET_LOG plus a local CState::ClearAmuleLog so the next GET returns the post-reset state immediately (the refresher's incremental append-only path can't shrink the cache) - DELETE /logs/serverinfo — EC_OP_CLEAR_SERVERINFO plus TtlCache::Invalidate on the server-info lazy cache; without the invalidate, the next GET would keep serving stale text until the 1 s TTL expires Body / shape additions: - POST /downloads accepts `{"links":[...]}` (RFC §4.2 shape) in addition to the existing `{"ed2k_link":"..."}` singular form; mixing both in one request is a 400. Partial success in batch mode returns 207 with `accepted` / `failed` / `failed_links` - POST /networks/disconnect now reads an optional `{"network":"ed2k"|"kad"|"both"}` selector and routes to the matching EC opcode (SERVER_DISCONNECT / KAD_STOP / DISCONNECT); no body or `"both"` preserves the original parameterless contract - GET /status now includes `kad.network:{users,files,nodes}` — the rollup the RFC §4.1 contract specifies. The data was always available (KadSnapshot already parses EC_TAG_STATS_KAD_USERS / _KAD_FILES / _KAD_NODES on the EC_OP_STAT_REQ at EC_DETAIL_FULL); only the /status response was missing it - GET /clients gains `?filter=uploads|downloads|active` to subset the peer list by transfer state (replaces the legacy `/uploads` page server-side) - GET /events gains `?channels=<csv>` to subscribe to a subset of the event types. Synthetic `resync` events are delivered regardless of filter (their purpose is a cache-invalidation signal the client cannot opt out of) Mutation cache hygiene: every mutating endpoint either runs an inline RefresherTick to repopulate the State-backed cache (which also fires the corresponding SSE diff events) or, for the two lazy-cached endpoints, explicitly invalidates the TtlCache so the next GET re-fetches. phase11.sh smoke (30/30 PASS) exercises each new endpoint and includes a post-mutation freshness check on /logs/amule and /logs/serverinfo — fast GET immediately after the DELETE must return the empty state, not the pre-reset cache.
Builds were green on macOS Release but failed on Ubuntu and Windows
Release/Debug and macOS Debug:
- src/webapi/Refresher.h declared `std::vector<std::string>` without
including <string>. libc++ (macOS) brings <string> in transitively
through <vector>; libstdc++ on Ubuntu and MinGW do not, so every TU
that included Refresher.h failed with `'string' is not a member of
'std'`.
- src/webapi/State.cpp used `std::unique_lock` without including
<mutex>. State.h ships <shared_mutex> which is enough for
shared_lock + shared_timed_mutex, but libstdc++ does not re-export
unique_lock from there, so the cpp failed to compile on Ubuntu and
MinGW with `'unique_lock' is not a member of 'std'`.
- src/webapi/Api.cpp used `std::ostringstream` without including
<sstream>. Worked transitively from picojson.h on most platforms;
add the explicit include defensively.
- unittests/tests/CMakeLists.txt: RefresherTest links libec.a but did
not link LoggerConsole.cpp. Release builds preprocess libec's
debug-emit paths to `do {} while(0)` via the ECLog.h define, so the
symbols (DoECLogLine, theLogger, CLogger::AddLogLine) are unused;
Debug builds keep the calls and require the symbols. Mirror
amuleapi's own webapi/CMakeLists.txt which already pulls
LoggerConsole.cpp in.
No behavior change; only build hygiene to get the CI matrix green.
High — must-fix: - amuleapi.conf is now mode-checked on every load and seeded with mode 0600 on first run. The conf carries the EC password in plaintext (App.cpp:202-210 MD5s it for the base class' hashed field) so it gets the same owner-only enforcement as the jwt-secret and passwords files. EnforceOwnerOnly() runs on the conf in LoadAmuleapiConf after the seed step. - --set-admin-pass / --set-guest-pass now propagate exit codes via std::exit(rc). Previously OnInit() returned false on failure, which cancels the run but still produces exit 0, so operators scripting `amuleapi --set-admin-pass=... && systemctl restart ...` could miss disk-full / mode-bit-rejection failures. - docs/QUICKSTART-AMULEAPI.md no longer lists /uploads. The endpoint was retired in Phase 4g and /clients (with the new ?filter= query) covers the per-peer surface. Medium: - Replaced the "is the field at its default value?" heuristic in LoadAmuleapiConfig with wxCmdLineParser::Found() predicates for --host / --port / --password. The old code would silently overwrite an operator-supplied `--host=127.0.0.1` with the conf value because it conflated "passed nothing" with "passed the default verbatim." - Dropped the no-op --daemon switch. The flag was wired to m_cliForeground = false but the field was never read. Mention in --foreground's help still notes "default." - WritePasswordsFile / WriteJwtSecretFile route through a new WriteFileAtomic0600 helper that writes to a sibling .tmp, fsync()s, then rename(2)s into place. A short write, signal, or crash mid-write now leaves the original file intact instead of truncating the only admin/guest credentials the daemon has. - Auth.cpp + Api.cpp pick up the #ifdef _WIN32 / #include <strings.h> shim that libwebcommon/HeaderParse.cpp already uses for strncasecmp. POSIX puts the symbol in <strings.h>; glibc also re-exports via <string.h> but musl and BSDs don't. - run-all.sh computes the repo root from $SCRIPT_DIR instead of hard-coding the original author's local path. AMULEAPI_ROOT remains an env override. Low — nits-as-comments + one test: - Comment at the SSE drain loop documents that ev.name is server- controlled (compile-time literals from EventBus::Publish call sites) so the un-escaped `event: <name>` write is safe in practice — and what to sanitize if that ever stops being true. - Comment at CRevocationSet::GcExpired() flags the O(n) sweep as load-bearing only at single-operator scale and points to the min-heap upgrade if the population ever climbs. - Comment at the per-streaming-session std::thread spawn calls out that the BindAddress=127.0.0.1 default is load-bearing against a thread-per-connection DoS, and points at the async-on- io_context migration that would close it off. - JwtTest gains direct boundary tests for Base64UrlDecode's len%4==1 and non-zero-residue rejections. Both craft a token with malformed b64 in the header section + a valid HMAC over the same signing input, so the constant-time MAC compare passes and Verify() actually reaches Base64UrlDecode (where the rejection must come from). - Comment at MakeSetCookie notes that the Max-Age=0 boundary on an already-expired expires_at is deliberate — the cookie expires on receipt, which is the right behaviour for logout-on-issue-of-expired-token paths. Drive-by CI fixes (caught while validating the review fixes): - EventBus::Publish raced on ID assignment: fetch_add lived outside the lock so a publisher could win the lock-order race but lose the id-order race, and a subsequent Drain would emit events in publish order with non-monotonic ids — failing EventBusTest's ConcurrentPublishersHaveDistinctIds on the macOS Debug runner. Moved fetch_add inside the lock so id assignment is atomic with the push_back. - Refresher.cpp's FULL-detail GetIPv4Data fallback path was removed. amuleapi only ever asks for EC_DETAIL_INC_UPDATE so the fallback was dead code in production, and it tripped a libec Debug-build assertion on RefresherTest fixtures (which pack the ECID as a uint32 in the EC_TAG_SERVER slot, exactly the not-IPv4-not-UNKNOWN case the assertion guards). - AmuleApiConfigTest gains #include <cstring> for std::strlen (Windows MinGW does not bring it in transitively).
High — must-fix:
- SSE channel-filter heartbeat starvation. The drain loop now tracks
the wall-clock since the last byte written and emits `: keepalive`
whenever it falls behind the 15 s budget, on EITHER the
drained-and-emitted, drained-but-all-filtered, or
drained-nothing branch. Previously a busy bus paired with a
`?channels=` that filtered every event left the wire silent
indefinitely — Drain returned immediately each time so the
Drain-timeout-based heartbeat never fired, and NAT / load
balancers / EventSource drop idle TCP connections after
30-60 s.
Medium:
- HandleStatus now reads through a new CState::Dashboard() accessor
that returns (status, kad, snapshot_at, ec_connected) under a
single shared_lock. Previously each accessor took its own lock so
a refresher tick could interleave between Status() and Kad() and
produce a /status response where `kad.network` described tick
N+1 while `ed2k.*` / `speeds.*` described tick N.
- HandleDownloadAdd no longer 503s mid-batch on an EC disconnect.
Per-link results split into three buckets — accepted_links,
failed_links, disconnected_links — and the response surfaces
all three in counters plus arrays. 202 (all clean), 207
(mixed), 503 (every link blocked by the same EC outage —
nothing landed). Caller can retry just the disconnected subset
once /status reports ec_connected=true.
Low — nits-as-comments:
- event_channel lambda now documents the bare-name fallback:
every bus event MUST contain `_`, the prefix names the channel.
resync stays the only no-underscore name and is published
directly to the writer (bypassing EventBus::Publish, hence
the filter), so the lambda's fallback is a defensive net for
future names rather than a routed code path.
- phase11.sh /clients?filter=active assertion: the prior comment
("no peer is in both states simultaneously") was contradicted by
the assertion's tolerance for double-counted peers. Re-stated as
`active ∈ [max(up, down), up+down]` to match the semantics
(peers can be uploading AND downloading at once; active is the
set union, sum is the multi-set sum).
- docs/QUICKSTART-AMULEAPI.md gains a "Notes on a few responses"
section explaining the 207 multi-status on POST /downloads
(the 503 reservation is documented too), and a security-notes
section calling out POST /servers/update as an
admin-authorized SSRF surface and the load-bearing
BindAddress=127.0.0.1 default.
Drive-by CI fixes:
- mingw-w64 link failures on JwtTest / EtagTest / JsonWriterTest
/ PathPatternsTest / AmuleApiConfigTest / AuthTest / StateTest
/ EventBusTest / EventDiffTest. muleunit is an OBJECT library
that pulls in src/libs/common/MuleDebug.cpp directly, which
references CFormat (defined in src/libs/common/Format.cpp).
On macOS/Ubuntu the symbols come through via wxWidgets / libc
link order; MinGW's strict linker rejects. Added mulecommon to
each test's target_link_libraries (CTagTest already pulls
Format.cpp via its own sources list, so the pre-existing tests
weren't affected).
- phase4c-bis.sh used to assert DELETE /logs/{amule,serverinfo}
→ 405. Phase 11 made DELETE a real mutation, so the test
now asserts PATCH → 405 (the logs are read-and-reset only,
never partially mutable) — same intent, post-Phase-11 contract.
Verified locally:
- ctest: 21/21 (all webapi + libwebcommon suites)
- run-all.sh curl matrix: every phase smoke green standalone
(phase8d occasionally flakes on the first hit after a daemon
restart — passes on re-run; not in any failure path)
The test was hard-coded to put its per-test config dir under `/tmp`, which doesn't exist on Windows MinGW (the CI matrix's mingw-w64 Release/Debug jobs failed every test with "The system cannot find the path specified" from wxRemoveFile / wxRmdir / wxMkdir). Switched to `wxStandardPaths::Get().GetTempDir()`, which returns `/tmp` on POSIX and `%TEMP%` on Windows. Same intent, portable. Also using `wxGetProcessId()` instead of `::getpid()` so the source doesn't carry a POSIX-only call site outside the `#ifndef _WIN32` include guard. ctest 10/10 locally; no shape change to the tests themselves.
Two pre-PR blockers: - LastSeenState race. Every Phase-5 mutation handler calls RefresherTick() inline on the HTTP-server worker thread; the wxApp refresher loop calls it once per second. Both paths used to end in EmitDiffsAndUpdate(bus, m_last_seen, state), which reads and mutates the per-substruct maps in m_last_seen without any internal lock. Two threads doing std::map insert/erase on the same map is UB; an attacker (or operator) coupling a busy client to inline mutations could corrupt the diff baseline and crash the daemon. Split: RefresherTick() no longer touches m_last_seen at all. A new EmitDiffsForEventBus() helper does the walk; the wxApp refresher loop in App.cpp (the single writer thread) calls it after each successful tick. SSE subscribers see post-mutation diffs on the next natural tick (≤1 s) instead of immediately — acceptable cost for memory safety. - SIGPIPE never masked. On Linux, an SSE peer that disappears mid-write raises SIGPIPE on the next asio::write to the closed fd. The streaming-handler socket writes don't pass MSG_NOSIGNAL, so without ignoring SIGPIPE the default disposition kills the daemon on every dropped EventSource. OnRun() now sets std::signal(SIGPIPE, SIG_IGN) alongside the existing SIGINT/SIGTERM/SIGHUP handlers; the writes return EPIPE and the streaming-handler loop bails on the next writer.Alive() poll. libwebcommon High: - CJwt::CJwt(secret) now throws std::invalid_argument on an empty signing key. CryptoPP's HMAC accepts a null/zero-length key without complaint, so a config bug that truncated the amuleapi-jwt-secret to 0 bytes used to produce a daemon that happily issued and verified tokens against an empty key. - WriteEscapedString now substitutes U+FFFD (replacement char) for unpaired UTF-16 surrogates instead of falling through to BMP serialization and producing invalid UTF-8 once the buffer flushes. Affects only paths that route raw wxStrings through the JSON writer with broken Unicode input. - FindCookieValue now strips RFC 6265 §5.2 OWS from both ends of the value (`name= value ` → `value`, not ` value `). Without it a JWT cookie set by a client that inserted a leading space after `=` would fail ConstantTimeEquals on every Verify. libwebcommon Medium: - ConstantTime.h gains a precondition block on the wxString overload: the length-mismatch check is NOT constant-time but is safe today because every caller compares fixed-shape inputs (32-char MD5 hex, 43-char b64url MAC). Future length-variable callers must pad or accept a length-side-channel leak. - CJwt::Verify now allows a 5-second exp leeway and rejects an iat claim more than 60 s ahead of the verifier's clock. Today the issuer and verifier share a clock so the skew is zero; tomorrow they may not (federated tokens, reverse-proxy auth handoff). webapi High: - CRateLimiter is now actually sliding, not tumbling. The legacy `failure_count + window_start` shape reset wholesale when the current window expired, letting threshold-1 attempts in the tail of window N + threshold-1 in the head of window N+1 slip through. Replaced with a std::deque<std::time_t> per IP: each NoteFailure evicts stamps older than window_seconds and appends now; lockout fires when the live count crosses threshold. Two new AuthTest cases pin (a) lockout expiry after lockout_seconds and (b) the split-attempt regression that the old tumbling impl would have passed. - SSE concurrent-session cap. A new process-wide atomic counter bounds streaming sessions at kMaxConcurrentStreamingSessions (32). The slot is acquired before the per-session OS thread is spawned; over-the-cap requests get 503 with `error.code=sessions_exhausted` + Retry-After:10. The cap caps the thread-per-connection DoS amplifier that becomes reachable if an operator switches BindAddress to a non- loopback. A startup WARN line also fires when BindAddress isn't loopback, so the operator sees the trade-off in the daemon log. - Session destructor now asserts the m_worker_exited atomic before detach()ing the thread handle. The "worker holds shared_from_this() across its run, so the dtor only sees it after the worker exited" invariant is now enforced at runtime instead of by-comment; a future refactor that drops the shared_from_this() capture fails loudly rather than mysteriously deadlocking. data layer (covered by BLOCKER 1 + comment fix): - m_partfile_rle comment now matches reality: the mutator is CState::MutateDownloads under the State write lock, NOT m_ec_mtx. m_ec_mtx is incidentally held across the same call stack but isn't the actual serializer. webapi Medium: - ParseEcidPath now uses std::strtoull with errno + ULLONG_MAX + > 0xFFFFFFFF gate. On Windows `unsigned long` is 32-bit, so the prior > 0xFFFFFFFFu overflow check was a tautology and a bogus high-value path segment would saturate to ULONG_MAX and silently match an actual ECID 0xFFFFFFFF. - MakeSetCookie / MakeClearCookie use std::string instead of a 256-byte snprintf buffer. Today's HS256 JWT fits; tomorrow's longer payload (extra claim, alg switch) won't truncate. tests: - AuthTest gains (1) RateLimiter_LockoutExpiresAfterLockoutSeconds (1.1 s real-time test for "infinite lockout" regression), (2) RateLimiter_SlidingWindowSplitAttemptsStillLockOut for the ring-buffer behaviour the old tumbling impl would have failed, and (3) RevocationListBlocksOtherwiseValidToken — the cross-test the review flagged as missing, composing CJwt::Verify with CRevocationSet so a regression in either component's lookup fails the test. - run-all.sh: pkill scoped to `--config-dir=/tmp/amuleapi-regtest` so a dev's open editor doesn't get killed. Daemon readiness wait is now `curl /version` polled with timeout instead of `sleep 1` (matches the line-19 comment about cold-start cost). A 429 probe after a failed phase prints a "phase3 left the lockout armed; restart amuleapi" tip so the next dev doesn't lose an hour. - phase5a.sh / phase8b.sh / phase11.sh: cleanup trap now also DELETEs the Ubuntu ISO test partfile (when ADMIN_TOKEN is available) so a failed run doesn't survive into the next as 6.6 GB of disk pressure on the Windows VM. docs/QUICKSTART-AMULEAPI.md: - Drop the broken `docs/api/` cross-reference (the directory doesn't exist yet); point at the source-of-truth handler routing in `src/webapi/Api.cpp` instead. - Add the amuled-cohabitation callout above the config-dir table (same data dir as amule.conf; no collisions; both daemons read disjoint files). - Reword the `--set-admin-pass` / `--set-guest-pass` block to explicitly say (a) no HTTP server is started, (b) no EC connection is attempted, (c) the exit code reflects success / failure so chains like `amuleapi --set-admin-pass=... && systemctl restart amuleapi` short-circuit on write failure. - CORS section gains a callout: the empty-allowlist form echoes the caller's Origin (not literal `*`), because `Access-Control-Allow-Credentials: true` is incompatible with the literal wildcard and cookie-auth would break otherwise. Verified locally: - ctest: 21/21 (new AuthTest cases included) - run-all.sh standalone (after restart): phase8a 11/11, phase8c 7/7, phase8d 9/9, phase11 30/30. (Without a daemon restart immediately after phase3 the new sliding-window correctly refuses login — that's the intent; the orchestrator's per-phase restart clears the bucket.)
- RateLimiter_SlidingWindowSplitAttemptsStillLockOut now actually discriminates. The prior timing (window=5, threshold=3, single sleep 1.1 s between failures) passed against both the old tumbling impl and the new sliding impl — the failures landed inside the same window from the OLD code's perspective so the reset never fired and the count went 1→2→3 regardless. Tightened to (window=3, threshold=3) with failures at integer-second offsets t=0, 3, 4, 5 — at t=4 the OLD code triggers the wholesale reset and the count drops to 1; at t=5 OLD count=2 (no lockout) while NEW evicts the t=0/t=1 stamps and tallies 3 within the live 3-second window (lockout). Verified with a standalone harness replaying the sequence against the OLD impl: locked=NO, count=2. - Session::~Session now wraps the worker-exit invariant in a hand- rolled `if (!flag) std::abort();` instead of plain `assert()`, so the check survives -DNDEBUG (compiled into every Release / RelWithDebInfo build). The worker lambda sets the flag via an RAII WorkerExitMarker struct rather than a bare end-of-body statement, so a future refactor that adds an `early return` after the catch block still trips the flag before `self` drops. - run-all.sh's final-teardown `pkill -f amuleapi` was the last un-scoped sweep (the per-phase one at line 53 was already narrowed to `--config-dir=/tmp/amuleapi-regtest`). Now the teardown uses the same pattern, so a developer with `vim path/to/amuleapi.cpp` open keeps their editor across full-matrix runs too. - docs/QUICKSTART-AMULEAPI.md: the cohabitation-with-amuled callout now sits ABOVE the per-platform config-dir table (the layout the prior commit message claimed but didn't actually deliver). The intro line + callout + "default location" + table reads more linearly too.
- src/libwebcommon/picojson.LICENSE: reproduce picojson 1.3.0's BSD-2
notice next to the vendored header, mirroring how muleunit ships
unittests/muleunit/license.txt. BSD-2 clause 2 obliges any binary
distribution (AppImage, Flatpak, .deb, macOS bundle, Windows
installer) to include the notice in materials provided with the
distribution — without this file the shipped binaries were
technically out of compliance even though the source-form clause
was satisfied by picojson.h's in-file comment.
- docs/THIRDPARTY.md: full credits/license reproduction for picojson
(BSD-2) and muleunit (LGPL v2.1, test-binary-only — explains why
the relinking clause does not constrain end-user binaries since
muleunit is never linked into amule/amuled/amulegui/amuleweb/
amuleapi).
- LICENSE.md: short header pointing at docs/THIRDPARTY.md for the
third-party components, leaving the GPLv2 text unchanged below.
- src/libwebcommon/Jwt.cpp: count '{' + '[' in each b64-decoded
JSON section and reject when > 32 BEFORE calling picojson::parse
on either the header (Jwt.cpp:225) or the payload (Jwt.cpp:250).
picojson's _parse_array/_parse_object is unbounded recursive
descent; both parses run before the MAC verdict reaches the
caller, so an unauthenticated peer could submit a token in
Authorization: Bearer whose b64-decoded JSON nests deep enough
to blow the worker thread's stack (musl pthreads cap at 128 KiB).
Real JWTs are flat — a cap of 32 leaves ample headroom; the
scan is O(body length) with no allocations.
- src/webapi/Api.cpp: pull the same opener-count cap into
ParseJsonObjectBody so every mutating-body parse site is covered,
and route HandleLogin through ParseJsonObjectBody instead of
calling picojson::parse directly — HandleLogin is reachable
unauthenticated, so it needs the same defence as the JWT
surface. Forward declaration in the helpers-anon-namespace block
near the top of the file so HandleLogin (defined above
ParseJsonObjectBody) can call it.
- unittests/tests/JwtTest.cpp: two new regressions
(DeeplyNestedPayloadRejected, DeeplyNestedHeaderRejected) build
a token whose nesting is 200 levels deep and sign it with the
matching HMAC so the MAC compare succeeds and the depth cap is
the only available reject path. Suite goes 23 → 25 cases, all
21 ctest binaries still green.
Six batches landed against the final pre-PR review. All 21 ctest
binaries still green; JwtTest grew to 27 cases (depth-cap + iat
mandatory regressions); AuthTest dropped from 6.5 s → 0.44 s
after clock injection.
Batch 1 — quick wins
- HttpServer.cpp: route 5xx body through CJsonWriter::ValueString
(H-S1).
- Auth.cpp Check(): evict stale failure stamps the same way
NoteFailure does (H-S2).
- EventDiff.cpp: capture-on-first-call thread-id assertion in
EmitDiffsAndUpdate; hard abort if a second thread publishes
(H-S3).
- PathPatterns.{h,cpp} + Api.cpp: LooksMalicious(path) defence-
in-depth — reject NUL / %00 / %2e%2e / literal ".." segments
before routing (H-S11).
- Jwt.cpp Verify: reject tokens > 4 KiB before any Base64Url walk
(M-S10).
- Api.cpp ?channels= cap at 32 tokens (M-S13).
- Api.cpp HandleLogout: revoked-token now 200 noop, not 401
(M-S14).
- Api.cpp: kSessionCookieAttrs as the single source of truth
for Set/Clear cookie attribute strings (M-S11).
- App.h: rename PartfileRleState() →
PartfileRleStateRequireStateWriteLock() to encode the
precondition (M-S2).
- State.cpp: comment the tick-end-stamping skew on m_snapshot_at
and the "events atomic, state isn't" invariant (M-S3 + M-S12).
Batch 2 — auth / JWT hardening
- Jwt.cpp Verify: iat is now mandatory; additionally cap
(exp - iat) ≤ TOKEN_LIFETIME_SECONDS + skew so even a
compromised secret can't mint a 100-year token (H-S10).
- Api.cpp HandleLogin: drop the JWT from the JSON body by
default — browsers get the cookie only; bearer clients opt
in via Accept: application/jwt or ?type=bearer (H-S12).
- Jwt.h: signing secret now stored as CryptoPP::SecByteBlock so
the live key gets scrubbed at destruction; constructor also
zeroes the caller's vector after copying (M-S1).
- AmuleApiConfig.cpp: first-run amuleapi.conf write routed
through WriteFileAtomic0600 — no truncated configs on mid-
write crash (M-S5).
- Api.cpp: new generic 401 rate limiter (30/60s, 5-min lockout)
applied to every auth-protected endpoint via
AuthenticateRequestRateLimited, including HandleLogout's
inline-soft path (M-S7).
- Api.cpp ResolveServerEcidByAddress: drop the s.address
string fallback; require IPv4-shaped match (M-S9).
- curl-tests: opt every login URL into ?type=bearer so the
script harness keeps reading the token out of the body.
- unittests/tests/JwtTest.cpp: two new regression cases
(MissingIatRejected, ExpIatLifetimeCapExceeded) and
EmptyJtiRejected updated to include a valid iat so it
exercises empty-jti rather than the new mandatory-iat path.
Batch 3 — SSE concurrency
- Api.cpp DispatchEvents: post-Drain gap detection — if the
cursor fell off the ring between drains, emit synthetic
resync (reason=gap) and restart from newest. Previously
only reconnects caught gaps; a cold-start tick on a 5K-
download library would publish ~5K events, evict 4900,
and the live drainer would silently miss them (H-S4).
- EventBus.{h,cpp}: Shutdown() + IsShutdown() latch. App's
OnExit calls Shutdown() before tearing down the
dispatcher; Drain wakes blocked subscribers, the SSE
worker loop polls IsShutdown() and exits — closes the
UAF window where a detached worker held m_dispatcher
across dispatcher.reset() (H-S7).
- HttpServer.{h,cpp} + Api.{h,cpp}: new StreamingPreflight
callable runs synchronously on the I/O thread BEFORE the
SSE worker thread spawns and BEFORE the 32-slot
streaming-session budget is touched. Unauth peers get
401/429 without burning a slot for the read-timeout
window (H-S8).
- App.cpp refresher loop: wall-clock-bounded to a 1 s
target cycle (was a fixed 4×250 ms sleep regardless of
tick duration) + WARN log on any tick > 3 s — typically
signals EC-mutex contention (H-S5).
Batch 4 — cache / perf
- TtlCache.h: split GetOrFetch into "fresh? return cached"
vs "claim inflight via condvar, drop mu, run fetch,
broadcast". A 30 s amuled stall on /stats/tree no longer
parks every concurrent reader (H-S6).
- Api.h + Api.cpp: snapshot-versioned ETag memoization keyed
on (request target, m_state.SnapshotAt()). Previously every
200 GET/HEAD ran MD5 over the (potentially multi-MB) body;
on a 10K-shared-file daemon that was the dominant CPU cost
on the safe-method path (M-S6).
- Pagination (M-S8) deferred per user direction.
Batch 5 — tests
- Auth.h + Auth.cpp: clock injection into CRateLimiter via
optional Clock parameter (defaults to std::time(nullptr));
AuthTest's sliding-window / lockout-expiry tests now run
with a controllable clock instead of real-time sleeps —
AuthTest dropped from 6.5 s → 0.44 s (H-T1).
- JwtTest MalformedBase64Rejected: drop the misleading
"constant-time" framing since no timing assertion exists
(H-T2).
- phase4.sh: poll /status until 200 instead of sleep 2 —
doesn't drift to disconnected under Windows VM load
(H-T3).
- phase4.sh test 4: real guest login instead of the comment-
only stub; _die if guest pass isn't configured so the
fixture gap surfaces (H-T4).
- phase3.sh: add 5th-attempt 429 check (was only the 8th)
so the lockout boundary is pinned (M-TD6).
- StateTest: ASSERT_TRUE(observed > 1000) instead of > 0 so
the tear-detection harness can't pass on a single
observation (M-TD7).
- AmuleApiConfigTest: document why the mode-bit test is
POSIX-only and what a future Windows port would land
(M-TD8).
Batch 6 — docs / man / cmake
- AmuleApiConfig.cpp: document the deferred --rotate-jwt-
secret CLI; manual rotation via "delete the file + restart"
works today (M-S4).
- cmake/options.cmake BUILD_AMULEAPI block: document the
hard-fail dependency chain (cryptopp via NEED_LIB_EC,
Boost via unconditional cmake/boost.cmake) so a future
refactor doesn't accidentally let amuleapi soft-disable
(M-TD1).
- docs/QUICKSTART-AMULEAPI.md: callout disambiguating
--port=4712 (EC port amuleapi talks TO) vs the 4713 HTTP
listener (port REST clients talk TO); curl example
switched to ?type=bearer (M-TD2).
- docs/man/amuleapi.1.in: drop the non-existent -V short
flag from SYNOPSIS + OPTIONS; the wired form is
--version (M-TD3 + M-TD4).
- docs/man/CMakeLists.txt: comment-only ack of the
translated-manpage gap — adding amuleapi to MANPAGE_BINARIES
today fails the build because po/manpages-*.po has no
amuleapi strings. English-only until 3.1; explicit
wiring instructions in the comment for whoever lands the
translator pass (M-TD5).
…sweep Two fixture bugs surfaced when run-all.sh was actually re-run on top of the sweep commit; both were introduced by the batch-5 changes that addressed reviewer M-TD6 / H-T4. - phase3.sh — the 5th-attempt boundary assertion (M-TD6) had the off-by-one in the wrong direction. Rate-limiter Check() runs BEFORE the password compare; NoteFailure() runs AFTER. So with threshold=5, the 5th attempt arms the lockout but still returns 401 (the password compare failed and the response was already prepared); the FIRST 429 lands on attempt 6 when Check() sees the armed lockout. The assertion now asserts 401 on attempt 5 and 429 on attempt 6, which is what the limiter actually guarantees. - phase4.sh — the new guest-bearer check (H-T4) referenced $GUEST_PASS without setting a default. run-all.sh configures guestpass via --set-guest-pass, but a standalone phase4.sh invocation has no GUEST_PASS in the environment and `set -u` bails. Default to `guestpass` to match run-all.sh's daemon fixture, and document the env var alongside ADMIN_PASS in the script header.
…cOpsImpact) Seven findings from the second-round review. All 21 ctest binaries still green; run-all.sh full curl matrix still green (660+ assertions across 26 phase scripts). - HttpServer.cpp:153 — set explicit m_parser->header_limit(16 KiB). Beast's default header_limit varies across versions (8 KiB in current upstream, larger on older). Without an explicit cap, a drip-feed attacker who slowly streams header lines grows the flat_buffer until the 10 s read timeout catches them — but that's 10 s × pool of concurrent peers of memory pressure. The explicit cap pins the per-peer ceiling at 16 KiB regardless of upstream defaults. (amule-project#3) - HttpServer.cpp:200-220 — replace the e.what()-in-body 500 with a generic "internal server error" message; log e.what() server- side via stderr. Information-disclosure defence: a future header-driven throw site (or picojson's offset-with-byte error messages today) would otherwise reflect caller-supplied bytes (Authorization fragments, Cookie content) back to the requester via the 500 body. The H-S1 fix in the prior sweep handled JSON escaping but kept the disclosure surface; this closes it. (amule-project#2) - App.cpp TextShell — bind-time hard gate. Refuse to start when BindAddress is non-loopback AND both admin/guest passwords are empty. The prior WARN-only behaviour meant a daemon started against a routable interface BEFORE the operator ran --set-admin-pass accepted TCP, refused every login (503 login_disabled), but still exposed the unauth surface (/version) and would silently leak any future read-without-auth endpoint. Loopback binds are unaffected — that's the documented first-run flow. (amule-project#1) - HttpServer.cpp + App.cpp shutdown — Listener::Stop now cancels every live SSE session's socket via a new Session::RequestCancel posted onto the session's strand. The new g_live_streams weak_ptr registry tracks every session that holds a streaming slot. Without this, a worker blocked inside a synchronous Beast write to a slow peer would hold the connection until the kernel TCP timeout (~minutes) and CHttpServer::Stop would join the io_context thread indefinitely waiting for the workers. Now the close fires, the write fails EPIPE, the worker exits cleanly, the last shared_ptr ref drops, and shutdown completes in bounded time. (amule-project#6) - App.cpp refresher loop — fail-loud on sustained EC blackout. After ~30 s of consecutive failed ticks, log a sharp WARN to stderr; after ~5 min, set g_shutdownRequested so a supervisor (systemd, launchd, docker restart=always) can bring the pair back. Previously the refresher would silently loop forever retrying a dead amuled, with the daemon serving 503 ec_unavailable from every endpoint with no visibility into the underlying cause. (amule-project#7) - EventBus.{h,cpp} + EventDiff.cpp — new CEventBus::PublishBatch(vector<pair<name,data>>) takes one lock acquisition + one notify_all for the whole batch. DiffMap now accumulates removed/added/updated tuples and emits via the batch API. The cold-start tick on a 5K-download library was previously firing ~5K individual notify_all cycles inside the refresher loop (each going through every drainer's cv mutex), which dominated the tick wall-clock. Single-publisher invariant + monotonic id assignment both preserved (the batch assigns inside the same lock as the per-event Publish). (amule-project#9) - App.cpp OnRun — sigaction with explicit sa_mask instead of std::signal. std::signal has implementation-defined "reset-to-SIG_DFL after firing" semantics on some platforms (notably musl/Alpine and legacy BSDs); a second SIGINT could default-terminate the daemon mid-shutdown. sigaction with SA_RESTART gives portable stay-installed semantics across every relevant target and prevents EINTR on blocking EC reads during shutdown. Windows lacks sigaction; std::signal fallback retained there. (amule-project#5 — Alpine/musl support)
Two new files covering the complete v0 API surface: - docs/amuleapi/REFERENCE.md (~950 lines) Endpoint catalog: every route under /api/v0/, grouped by resource (system, auth, downloads, clients, shared, servers, categories, preferences, networks, kad, logs, stats, search). Per endpoint: HTTP method(s), auth role (NONE/GUEST/ADMIN), query/path params, request body schema, complete curl example, sample success response, and the error codes the handler can emit. Sections upfront for the shared model: auth carriers (bearer vs cookie), role gating, rate limiting, JWT structure, error envelope, ETag and conditional GET, CORS surface, path validation, request size limits. Closing error-code catalog so clients have one table to alert on. - docs/amuleapi/EVENTS.md (~310 lines) SSE contract: connect flow, headers, initial reassurance chunk, frame format, channel filter (downloads/shared/servers/clients/ status/logs), 15 s heartbeat semantics, Last-Event-ID reconnect matrix, live-path gap detection, and the synthetic resync event. Catalogs every event the bus emits with its payload shape: download_*, shared_*, server_*, client_*, status_changed, log_appended, plus the filter-bypass resync. Closing notes on the single-publisher invariant and shutdown behaviour. Cross-linked from: - docs/QUICKSTART-AMULEAPI.md (replaced the "OpenAPI is a follow-up" hedge with pointers to the new files). - docs/man/amuleapi.1.in (the man page promised docs/api/; updated to point at docs/amuleapi/REFERENCE.md and docs/amuleapi/EVENTS.md). No source changes; pure docs. Closes the "where's the API spec" gap that the man page was advertising against an empty directory.
The man page originally promised docs/api/; restore the directory name to match. Renames the two reference files in place and updates the cross-links in QUICKSTART-AMULEAPI.md and the amuleapi(1) man page. No content changes.
The doc examples carried 176.123.5.89:4725 verbatim, lifted from the phase11.sh curl regression test where it was used because it's a known-live ed2k server. That's fine for a smoke test but wrong for user-facing docs — implies endorsement and links a third-party operator's address to the project. Swap to 203.0.113.5:4242 (RFC 5737 TEST-NET-3, reserved for documentation) throughout REFERENCE.md and EVENTS.md and align the port in the example status response so the two examples stay consistent. No structural changes.
…nt cleanup
Code:
- EventDiff.cpp/h: every ToJson(*Snapshot) rewritten to emit the same
byte-for-byte shape as the matching REST list-item writer in
Api.cpp. Every Equal(*Snapshot) expanded to compare every emitted
field so SSE _updated fires on any movement (the original gap was
shared_updated never firing on xfer counter movement). New
ToJsonStatusEvent(Status, Kad, ec_connected) produces the nested
REST /status envelope; new Equal(KadSnapshot). EmitDiffsAndUpdate
reads state.Dashboard() so status + kad + ec_connected are diffed
against the prior tick as one rollup. LastSeenState gains
KadSnapshot kad + bool ec_connected.
- Api.cpp: HandleNetworksConnect accepts {"network":"ed2k"|"kad"|
"both"} matching /networks/disconnect; routes to EC_OP_SERVER_CONNECT
/ EC_OP_KAD_START / EC_OP_CONNECT. HandleKadConnect +
HandleKadDisconnect deleted as strict aliases of the granular form.
/search/results fires EC_OP_SEARCH_PROGRESS in the same single-flight
TtlCache miss; response carries progress.{percent, complete} so
empty results aren't ambiguous between "no search", "in flight",
"finished with zero hits". Api.h adds SearchCacheValue + new
TtlPair_Search alias.
Tests:
- phase5e.sh: replace dedicated /kad/{connect,disconnect} assertions
with /networks/{connect,disconnect} body-selector assertions; add
network=ed2k and bogus-selector negative cases.
- phase6.sh: assert progress envelope on /search/results.
Docs:
- REFERENCE.md: TOC linking every endpoint section; clear_completed
cross-references DELETE /downloads/{hash}; kad/bootstrap body
corrected from a wrong {server_address: ...} to the real {ip,
port} shape; /networks/connect doc updated for the body selector
and /kad/{connect,disconnect} removal noted; /search/results doc
shows progress envelope.
- EVENTS.md: every event sample replaced with the aligned shape;
cross-resource intro states _added/_updated are byte-identical to
the REST list-item shape they mirror.
Comment cleanup:
- Removed PLAN.md / PLAN §N / internal-phase-naming references from
in-code comments per the standing policy. Replaced "deferred to
v0.2" parentheticals with their removal. Section-divider phase
comments and pure phase-marker comments deleted. Strip is partial;
the bulk-apply tool tripped a regex bug; this commit is the
post-recovery point and a follow-up sweep will finish.
All 21 ctest binaries green; build clean.
d80bdc2 to
7016701
Compare
|
Great work. It will take me a few days to test everything. The initial impressions are good, but I have a few questions/concerns that I'll leave here. The first one is related to the workflow. I don't want to be working on huge pull requests for months. I would like a plan so we can gradually merge PRs without blocking bug-fixing releases. The first PRs can be large, but afterwards, I would like to work with small stabilization and bug-fixing PRs. Another concern I have is about how the event system works. I haven't looked closely at the source code, but I've seen that you make requests using the EC protocol every 1 second. If this is the case, I don't think we have a good architecture. The most common use case will be a low-power NAS server running the AmuleD and AmuleAPI processes 24/7. There is only 1 client and 99% of the time it is not connected to the web interface. It is a waste of resources for AmuleAPI to be continually synchronizing and serializing information when nobody is consuming it. These processes have to be as efficient as possible to spend the CPU/Memory resources on downloads and uploads. We should rethink this use case. I like events, but servers shouldn't waste resources when no one is connected. Regarding the events, they seem to be working well, but I think we can improve this. I have doubts about the scalability of the solution. My long-term goal is to have a WebUI with all the functionality of Amule / AmuleGUI. As you already know, I am sharing thousands of files, I have more than 100 concurrent uploads all the time... I don't know if the event system is prepared.
I'm going to need us to add the functionality of serving static resources in AmuleAPI. I have made an implementation here for my tests. => #220 I am testing several web technologies to see which one best fits the project. If you want to take a look, it is here, but I will probably throw everything away and start over. => #220 BugsIn this PR, I see that the curl test files have non-descriptive names. They should be renamed so that they make sense to other developers. The GET /shared endpoint is not working correctly. I have 3 downloads in progress and shared returns these results. The GET /categories endpoint behaves strangely. When there are no categories, it returns an empty list. When there is 1 category, it returns 2 elements (category 0 / all and the category I have added). This makes the API inconsistent. I would always return category 0. We need to verify that category 0 cannot be edited. The GET /search/results endpoint does not work properly. If I call it right after POST /search (global search) I get The download "parts" are not being sent. It is not necessary in the first version, but it is what is missing to be on par with amuleweb. |
|
Thanks for the detailed look. Quick triage: Workflow — agreed. Land amuleapi + the bug fixes below in this PR, then small stabilization PRs after. Idle polling — you're right that the refresher loops at 1 Hz unconditionally ( Event volume / "many events" — 3 downloads × 1 update/sec is normal for a live UI, but coalescing per-tick deltas into a single batched event would cut framing overhead substantially. Will look at this with the idle-poll work. Event volume / "same info" — the diff path has an Search events — will add a
Download Curl test file names — true, the Static-serve infrastructure — agreed, will fold it into this PR. Taking the implementation from #220 with three hardening fixes: symlink containment (resolve realpath, reject if it escapes static_root), file-size cap (16 MiB), and ETag/If-None-Match so subsequent loads hit 304 instead of full re-read. Default placeholder |
…Snapshot amuled's downloads (CPartFile in downloadqueue) and shared (CKnownFile in sharedfiles + partfiles with PARTFILE_SHARED=true) views share a single per-process ECID counter (CECID::s_IDCounter) and a unified encoder-side CObjTagMap. Mirroring that on the receiver side avoids the class of bugs we kept patching: - The /shared "ghost" rows we fixed in e57cbf9 were caused by the shared walker hitting an ECID that the downloads walker had already populated in a SEPARATE m_shared cache, with the PARTFILE_HASH child tag CValueMap-suppressed (already sent on a prior partfile-walker tick). We worked around it by composing a dl_identity_fallback map and passing it to the shared walker. - With one ECID-keyed FileSnapshot map, the shared walker finds the entry the downloads walker just touched, with hash + name already set. It just flips is_shared=true and merges its own fields. The fallback compose disappears. Schema: struct FileSnapshot { uint32_t ecid; string hash, name, ed2k_link; uint64_t size; string priority; bool is_downloading, is_shared; struct DownloadSide { ... } download; // partfile-walker fields struct SharedSide { ... } shared; // knownfile-walker fields }; Role transitions: - Downloads walker: on PARTFILE entry, sets is_downloading=true and merges f.download. On FILE_REMOVED, clears is_downloading + resets download to default; drops the entry entirely if !is_shared. - Shared walker: same shape for is_shared/f.shared. PARTFILE_SHARED= false clears is_shared and resets the shared sub-block; the entry stays in the unified map (entity eviction is FILE_REMOVED's job). - Order-independent: running downloads-then-shared and shared-then- downloads converge to the same final state. Reset-on-transition guarantees /downloads can never serve stale upload stats, and /shared can never serve stale download stats — same lookup hitting either view always reads from a freshly-walked sub-block. API surface unchanged: GET /downloads filters m_files by is_downloading; GET /shared filters by is_shared. Same wire shape; clients see no difference. Bonus from the unified map: FindDownload(hash) / FindShared(hash) now hit a parallel std::unordered_map<hash, ECID> index (m_hash_to_ecid) that's rebuilt at the end of every MutateDownloads / MutateShared. Was O(N) linear scan; now O(1) hash lookup + O(log N) map find + role-flag check. REFERENCE.md updated. RefresherTest migrated to FileSnapshot + role-flag assertions (32 cases, all pass). EventDiff diffs the unified files map with role-flag awareness: download_{added,updated,removed} fires on is_downloading transitions, shared_{added,updated,removed} on is_shared transitions. A single tick can emit both for one ECID (e.g. partfile becoming shared). Full curl suite green: 27/27. Refs amule-project#201 review.
173c5f9 to
03700a6
Compare
|
Five commits since the last update:
Packaging fired on the fork: https://github.com/got3nks/amule/actions/runs/27907062647 Performance numbers, addressing the earlier concernsRan CPU + bandwidth measurements on a production node (busy uploader at ~33 MB/s) to put real numbers on the perf discussion:
Reading those:
For the cellular-bandwidth concern, the cheap mitigation is HTTP For the "disconnect amuled state baseline after N min of idle" idea — CPU doesn't look like the bottleneck on this sample (0.20% idle, +0.07% per active subscriber), but I'd want measurements on constrained hardware (low-end ARM NAS) and on a huge library before making a call. A lighter intermediate is to stretch For comparison, amulegui uses the same per-second polling pattern while open ( @ngosang — I think we're at a point where this can merge into master once you've verified the fixes + features land cleanly on your side. It's strictly additive — no amuleweb legacy paths touched, the existing REST/SSE wire shape is unchanged, and the bandwidth / idle-tick-rate follow-ups are tracked separately so they don't gate the merge. |
|
I have rebased my PR on top of this one and it works fine.
Just a couple of comments. I see that some endpoints accept the md4 hash and the ecid as parameters. IMHO, I think the ecid is something internal and should not be exposed in the public API. I think we should work only with the MD4 hashes of the downloads. There are several reasons not to expose the ecid:
For all these reasons, I think it would be better to work only with md4 hashes from the beginning and do the conversion in the backend between the hash and the ecid when necessary. I would not accept it as an identifier in the URLs nor would I return it in the body of the responses and events. I would always work with the md4 hash. Regarding this PR, you can merge it whenever you consider appropriate. As for me, we can continue working without merging. I don't want to merge my PR until I have something more or less stable. I still need to evaluate the best solution to make the design responsive on smartphones, for internationalization, and whether to use a library to render the charts. It's going to take me quite some time, several weeks, to have something I am satisfied enough with to merge. I will surely be asking you for some changes to the APIs. This is a feature meant for the next release, therefore I am not giving it top priority. I am working on other projects at the same time. |
|
Good point. The ECID exposure was initially load-bearing because After the unified
|
Instead of calling it "key" or "ecid" or "id" could you call it "hash" or something similar everywhere? I think it's more clear. |
Hashes (MD4, 32-char hex) are the only identifier the API exposes for
partfiles and shared files. File ECIDs were never wire-public:
- /downloads/{key} → /downloads/{hash} (PATCH/DELETE/detail)
- /shared/{key} → /shared/{hash}
- removed "ecid" from /downloads, /shared, search-result event payloads
- /clients: upload_file_ecid / download_file_ecid → upload_file_hash /
download_file_hash. The walker now resolves EC_TAG_CLIENT_UPLOAD_FILE
and EC_TAG_CLIENT_REQUEST_FILE into MD4 hashes at refresher-tick time
using an ECID→hash snapshot built from the unified file map.
Client and server ECIDs stay — they are still the URL key for
/clients/{ecid} and /servers/{ecid}, and there's no hash for either.
Tests updated: curl phases 04 + 17 drop the file-ECID assertions and
the PATCH /shared/{ECID} round-trip. All 22 cpp unit tests still pass;
all 27 curl phases still pass against the local amuled.
Closes the file-ECID strip discussed in amule-project#201.
Disambiguates from the file ECIDs we just stripped — `ecid` alone made
it look like a file identifier, when it's actually the peer's identity.
Affected surfaces:
- GET /clients[] payload
- client_added / client_updated SSE payloads
- client_removed SSE payload (`{ "client_ecid": N }`)
server_ecid stays as plain `ecid` — same field name as before, still the
URL key for /servers/{ecid}.
Curl phases 10 + 26 updated. All 22 cpp unit tests + 4 affected curl
phases (10, 22, 24, 26) pass.
CState's unified file map was std::map<uint32_t, FileSnapshot> with a parallel std::unordered_map<string, uint32_t> hash→ECID index that got rebuilt from scratch on every MutateDownloads / MutateShared call (one rebuild per refresher tick). That O(N) per-tick scan was the dominant cost on large libraries — 100k files × 2 walkers × 1 Hz = 200k iterations per second spent re-deriving an index that almost never changed. Replace both with a single FileMap wrapper class that owns both data structures: * std::unordered_map<uint32_t, FileSnapshot> for ECID-keyed storage * std::unordered_map<string, uint32_t> for hash → ECID lookups emplace() and erase() update the index inline (O(1) avg per change); hash never mutates within an ECID so no other surface needs hooks. Walkers keep their .find()/.emplace()/.erase() calls unchanged — the wrapper exposes the same iterator surface. Per-tick rebuild deleted. Lookups now both O(1) avg: * ECID → FileSnapshot : FileMap::find(ecid) * hash → FileSnapshot : FindEcidByHash(hash) + find(ecid) CState::Downloads() / Shared() / Files() vectors are no longer in ECID-ascending order (unordered_map iteration is bucket-defined). Consumers already sort on their side (ngosang's feedback on PR amule-project#201), so this is wire-visible but harmless. REFERENCE.md tidied — the previous /downloads/{key} prose was left over from the file-ECID-strip pass; consistent {hash} naming throughout now, no internal-state details leaking onto the public API surface. 22/22 cpp unit tests + 27/27 curl phases pass.
|
@ngosang — the three follow-up commits just landed:
All path placeholders in Worth rebasing the front-end PR #220 on top — these are architectural changes and the wire format moved (file ECIDs disappeared from Plan from this side: leave PR #201 in draft (no merge) until 3.0.1 ships, which buys breathing room for the front-end testing pass to surface any other bugs / feature requests on amuleapi. Probably worth promoting the API to |
8025917 to
7a9ae82
Compare
EVENTS.md never spelled out the call ordering for the snapshot + stream combination. The naïve REST-first → SSE-second pattern silently drops events that fire between the two — a one-line bootstrap bug that's easy to write and hard to spot in testing. New "Bootstrap: snapshot + stream" section explicitly: - Names the race (REST→SSE loses events fired in the gap) - Documents the correct order (open SSE first, then GET, apply events as upserts on top of the snapshot) - Explains why the merge is consistent (events for tick N are emitted after the state mutation for tick N is committed) - Gives a JS bootstrap example with parallel snapshot fetches - Points at the resync recovery rules for the failure modes
7a9ae82 to
6f32a7a
Compare
Bootstrap-companion gaps the doc didn't cover, found while sanity-checking the bootstrap example against real-world client paths: - Connecting / Browser example: add an `error` listener with the `readyState === CLOSED` terminal-failure check, and clarify that the native browser EventSource API can't carry an `Authorization` header so browser bearer-on-SSE requires a polyfill. - Channels and filtering: note that the snapshot fetches in the bootstrap must mirror the `?channels=` filter; pulling unsubscribed collections leaves them silently stale. - Event catalog intro: call out that `status_changed` is a full envelope replacement and `log_appended` is an append — they don't fit the `_added` / `_updated` / `_removed` upsert pattern. - Search / Logs channel sections: explicit pointer back to Bootstrap noting that `/search/results` and `/logs/amule` aren't fetched there by default, with the case for adding them. Plus a tidy-up: every `§Section` cross-ref now uses a clickable markdown anchor link instead of the section-symbol convention (matches the REFERENCE.md links and renders properly on GitHub).
The existing EC_TAG_SEARCH_STATUS overloads a single uint32 to mean a lot of different things — raw=0 covers "no search", "Kad in progress", "global queue not yet populated", and "global just finished"; raw=100 covers "global queue empty at start" and "global all done briefly". 0xfffe = Kad done, 0xffff = global done (or local in progress). The desktop GUI and amulecmd live with this because their UI state + explicit notifies cover the gaps; for headless consumers driving a JSON wire surface (amuleapi) it forces a defensive state machine (saw_in_progress tracking, 30 s timeout, started_at bump-on-result). Add three new optional tags on the EC_OP_SEARCH_PROGRESS response so modern consumers can read the lifecycle directly without the sentinel decode, while pre-3.1 consumers (amulegui, amulecmd, amuleweb) keep reading EC_TAG_SEARCH_STATUS unchanged: EC_TAG_SEARCH_LIFECYCLE_STATE uint8 0=idle / 1=running / 2=finished EC_TAG_SEARCH_LIFECYCLE_KIND uint8 0=local / 1=global / 2=kad EC_TAG_SEARCH_RESULT_COUNT uint32 size of m_results[m_currentSearch] Derivation in CSearchList::GetSearchLifecycleState(): - m_currentSearch == -1 → IDLE (no search this session) - m_searchType == KadSearch → m_KadSearchFinished ? FINISHED : RUNNING - otherwise (Local / Global) → m_searchInProgress ? RUNNING : FINISHED EC_TAG_SEARCH_STATUS is unchanged. amulegui / amulecmd / amuleweb iterate tags they recognise and skip unknowns — verified by a full build of all four binaries (amule, amuled, amulegui, amulecmd) against the same EC core. No per-searchID tracking yet — Get_EC_Response_Search still pins the sentinel 0xffffffff. Lifting that is a separate (larger) change.
amuleapi pins a daemon version carrying the new EC_TAG_SEARCH_LIFECYCLE_* tags landed in the previous commit, so there's no reason to keep the sentinel-disambiguation state machine around as a fallback. Replace it with a single ten-line mapping from (lifecycle_state, raw_pct) to (active, complete, percent). Refresher.cpp/h: AdvanceSearchProgressFromLifecycle becomes the only AdvanceSearchProgress. The legacy state machine, its 60-line comment block, and the defensive 30 s timeout / saw_in_progress / Kad-deadline- bump-on-result tricks are all gone. RefresherTick.cpp: drops the two-path branch that picked between legacy and lifecycle. Reads both EC_TAG_SEARCH_STATUS (for the global percent) and EC_TAG_SEARCH_LIFECYCLE_STATE every tick; passes both to the helper. State.h: SearchProgressSnapshot loses saw_in_progress, raw, started_at, last_results_count — all of which only existed to drive the old disambiguation. Keeps active / kind / percent / complete (the four fields surfaced by GET /search/results and the search_finished SSE). State.cpp: MarkSearchStarted no longer stamps started_at — that field no longer exists. RefresherTest.cpp: 12 legacy tests covering every branch of the sentinel-decode machine drop to 5 trivial tests of the new mapping (running with global percent, running masks percent for Kad / local, finished sets complete, idle defensive zero). Net diff: -201 lines, same surface contract on /search/results and search_finished, and zero magic-number sentinel decoding on the amuleapi side. 22/22 cpp unit tests + 27/27 curl phases pass against the patched daemon.
|
Two more commits landed on the branch:
Heads-up for testing: since amuleapi pins the new tags as required, this branch's Cross-platform builds for this branch (amuled + amulegui + amuleapi + amulecmd + amuleweb, plus AppImage / Flatpak / macOS Universal2 / Windows) are running here — artifacts will be downloadable when the run finishes: |
|
I updated my PR and I found some errors in the documentation + 1 outdated comment in the code => e134f06 1. Search progress eventThe amuleapi SSE RequestExpose search progress over SSE so clients can drive the whole search lifecycle from events and drop the periodic
Why this is cheap to addThe daemon already computes the normalized progress every refresher tick ( 2. For KAD search
It's possible to estimate the % based on time? For example, it there is a timeout we can update the progress +10% when time pass. |
|
Forgot to port the REST API updates over to the SSE events, will do tomorrow. |
EC_OP_SEARCH_PROGRESS gains a fourth lifecycle tag, EC_TAG_SEARCH_LIFECYCLE_PERCENT (0x070D): a unified 0..100 completion the daemon computes for every search kind. Global reuses the real server-queue percent; Kad -- which has no measurable mid-flight progress -- gets a cosmetic time-ramp off the fixed SEARCHKEYWORD_LIFETIME (45 s), capped at 99 until the authoritative finished lifecycle state snaps it to 100 (an early finish on SEARCHKEYWORD_TOTAL snaps it too); local is instantaneous. CSearchList stamps m_searchStart in StartNewSearch and derives the value in GetSearchLifecyclePercent(). EC_TAG_SEARCH_STATUS is unchanged on the wire, so amulegui / amulecmd / amuleweb keep working untouched -- only modern consumers (amuleapi) read the new tag.
…es search_finished)
Port search status to the event channel. EventDiff now emits a
search_progress event on every percent change while a search runs and on
the running->finished edge; the terminal frame carries state="finished",
so the old standalone search_finished event is retired (it was just the
completion case). Payload: {state, percent, results, kind}.
The refresher reads the daemon's new EC_TAG_SEARCH_LIFECYCLE_PERCENT
directly, so AdvanceSearchProgress drops its kind-special-casing and the
old EC_TAG_SEARCH_STATUS percent decode -- the daemon already computes a
clean 0..100 for every kind (Kad ramp included). GET /search/results'
progress object is brought to parity with the event: state/kind/percent,
with the canonical state string replacing the redundant complete/active
booleans.
Tests: RefresherTest gains pass-through/clamp cases for the unified
percent; curl 19-search adds a live Kad-ramp assertion (percent climbs,
capped at 99 while running); 22/26 assert search_progress in place of
search_finished.
…drift Document the new search_progress event and the REST progress object (state/kind/percent), folding in the retired search_finished. Also correct documentation drift confirmed against the actual JSON writers (several flagged in amule-project#220): /status gains kad.network; /clients gains the full peer field set (software_version, os_info, ident_state, xfer, queue/score/obfuscation/friend_slot); /servers gains description/version/max_users/ping_ms/failed; /kad gains firewalled_udp/in_lan_mode/ip/indexed/buddy; /logs/amule is {lines,total_cached,returned}; /logs/serverinfo is a {text,*_bytes} blob; /stats/tree is {nodes:[{label,children}]}; /stats/graphs is rich point objects with a width param; /search/results entries use sources:{total,complete}+already_have+rating. Drops the stale per-GET search TTL-cache note (search is refresher-driven now) and the log_appended {buffer} field / phantom search_result_added ecid.
549a1c6 to
d87b0ea
Compare
|
Follow-up commits on
Because the EC tag is a wire change, this needs a freshly built amule/amuled — prebuilt binaries (appimage/flatpak/macos/windows) land on the packaging run here once it finishes: https://github.com/got3nks/amule/actions/runs/28018726308 @ngosang — heads-up that your frontend in #220 will need a rebase onto this; and since |
…ct progress doc claim
search_result_added now emits `sources:{total,complete}` byte-for-byte
identical to WriteSearchObject (the /search/results[] entry), like every
other collection event mirrors its REST writer. It previously flattened
to sibling `sources` + `complete_sources` integers -- the lone event
that diverged from its REST shape.
Also corrects the overclaim that the REST /search/results `progress`
object mirrors the search_progress SSE event 'field-for-field': the
event additionally carries a `results` count (it has no results array
beside it). Fixes the matching stale comment in Api.cpp plus the
EVENTS.md / 19-search.sh references to the removed progress.complete
output boolean (state is canonical now).
|
@ngosang — one more
{ "hash": "…", "name": "…", "size": …, "sources": { "total": 12, "complete": 7 }, "already_have": false, "rating": 0 }So a single result-rendering function can handle both the Refreshed binaries (built off |
Supersedes #132.
Summary
This PR ships a standalone REST API + Server-Sent Events daemon,
amuleapi, that talks toamuledas a normal EC client. The API lives at/api/v0/*; the SSE stream is at/api/v0/events. Full per-endpoint contracts live in docs/api/REFERENCE.md; the SSE event catalog with Last-Event-ID reconnect semantics lives in docs/api/EVENTS.md; first-run setup in docs/QUICKSTART-AMULEAPI.md. ETag (snapshot-versioned, memoised on(target, snapshot_at)), CORS (preflight + credentialed origin echo with allowlist), and the SSE framing/reconnect protocol are all documented under those two files.Why standalone instead of bolting onto amuleweb
The first cut of this work was implemented inside amuleweb. We scrapped it for three reasons.
Stable amuleweb stays untouched. Merging this PR adds a new binary; it does not modify amuleweb. A 3.0.1 bugfix release can ship from master without any regression risk against the amuleweb users have been running for years. amule can ship both binaries side-by-side for a transitional period; whoever wants the REST surface enables
amuleapi, whoever's on amuleweb's PHP keeps it. Operators can adopt at their own pace.Concurrency efficiency. amuleweb runs a single long-lived EC connection too, but every request handles synchronously on the wx event-loop thread (no per-request thread spawn — see
src/webserver/src/WebSocket.cpp:282whereProcessURLis dispatched inline) and each PHP page does its ownSendRecvMsg_v2roundtrips against amuled with no cross-request snapshot cache. Adding a long-lived SSE channel that drains events for the lifetime of anEventSourcewould pin the event-loop thread, and bolting on the snapshot cache the REST path needs is a non-trivial refactor of the bespoke PHP interpreter. amuleapi takes the opposite shape: one EC connection feeding a 1 s refresher loop that maintains a coherent state snapshot every HTTP handler reads under ashared_timed_mutex, plus an in-process event bus the refresher publishes deltas to for SSE. No bespoke PHP runtime, no per-request EC roundtrip storm, no event-loop thread to tie up.Modular architecture eases future integration into amuled. The HTTP, SSE, auth, ETag, state-snapshot, and event-bus scaffolding (~6000 LOC across 12 files) is EC-agnostic and ships into any future in-process integration unchanged. The EC plumbing is concentrated in three files (
Refresher.cpp,RefresherTick.cpp,App.cpp's entry point) plus every individual mutation and lazy-cache handler inApi.cpp. Inside amuled those become direct model accesses againstCamuleApp::theApp()— the code gets simpler (no wire-format walking) but each handler needs touch-up. This is not a deliverable for this PR — just a property the architecture preserves.Performance model — refresher vs lazy
Two distinct caching paths land in this PR.
Refresher-driven items (every tick, snapshot-coherent).
/downloads,/shared,/servers,/clients,/status,/categoriesare populated by a 1 s loop that runs two EC calls per tick:EC_OP_GET_UPDATEatEC_DETAIL_INC_UPDATEfor the four list substructs, plusEC_OP_STAT_REQfor the status envelope. TheEC_DETAIL_INC_UPDATEdetail level uses amuled's per-clientCValueMapto ship field-level deltas — first-encounter rows carry full identity (NAME, HASH, SIZE, ED2K_LINK), subsequent ticks carry only the bytes that actually moved. TheEC_DETAIL_INC_UPDATEmechanism itself dates to 2005, but the wave of fixes that landed in 3.0.0 (full-detail emission for unseen ECIDs, the EC skip-unchanged series for per-file change tracking, alive-marker handling) are what make this work cleanly for a fresh subscriber on a cold start. One wire roundtrip per second, packet size proportional to actual change. The refresher then computes a diff against the prior snapshot and publishes one event per changed entity onto the in-process event bus via a single batchedPublishBatchcall, so SSE subscribers see push notifications within ~1 s. Mutating REST calls (POST/PATCH/DELETE) callRefresherTicksynchronously on the way out (under the samem_ec_mtx), so the response and the next GET both reflect post-mutation state.Lazy-fetched items (TTL-coalesced, mutation-invalidated).
/stats/tree,/stats/graphs/{graph},/logs/serverinfo, and/search/resultsfetch on demand with a 1 s single-flight TTL cache (CTtlCache::GetOrFetch). The first concurrent request runs the EC roundtrip; every other reader hitting the same endpoint in the same second parks on a condvar inside the cache, then reads the just-stored value — no duplicate roundtrips, no lock held across the EC call (the cache mutex is dropped during fetch so a 30 s amuled stall on/stats/treedoesn't park every reader of unrelated lazy endpoints). Mutation handlers explicitly invalidate the affected lazy cache:POST /searchinvalidatesm_search_cache,DELETE /logs/serverinfoinvalidatesm_server_info_cache, etc. Smoke tests verify the next read after a mutation sees fresh data, not the pre-mutation snapshot.The ETag layer caches
(target, snapshot_at)→ MD5 so safe-method GETs on a snapshot-stable target reuse the validator without re-hashing the body.If-None-Matchmatches → 304 with the ETag preserved per RFC 7232 §4.1.Wire-level surface in numbers
downloads,shared,servers,clients,status,logs) and the synthetic always-deliveredresynceventiat, lifetime cap enforced on Verify) via eitherAuthorization: Beareror an HttpOnlySameSite=Strictcookie/auth/login(operator-configurable thresholds) plus a generic 30-failures/60s lockout across every other 401-returning endpointTesting
Two layers in this PR; both green at the tip.
C++ unit tests via ctest — 21 binaries, including
JwtTest(27 cases covering crypto identity, alg/typ/iat invariants, depth-cap rejection, base64url edge cases),AuthTest(rate-limit clock injection so the suite runs in 0.44 s instead of 6.5 s),EventBusTest,EventDiffTest,StateTest(concurrent reader/writer tear-detection at 1000+ observations),RefresherTest,AmuleApiConfigTest(0600 enforcement on POSIX),JsonWriterTest,EtagTest,PathPatternsTest. The suite shapes the binary's contract; CI runs ctest on the same matrix it builds.End-to-end curl regression matrix — 26 phase scripts under unittests/curl-tests/amuleapi/, 660+ assertions total, driven by run-all.sh. The orchestrator brings up a fresh
amuleapidaemon per phase (fresh config dir, fresh rate-limit buckets, fresh state cache) against a configuredamuled, runs the phase script, aggregates pass/fail. This is the "does the wire contract actually match the docs" layer — every endpoint, every error path, every method gate, every CORS preflight, every SSE replay scenario.CI artifacts
Build matrix at https://github.com/got3nks/amule/actions/runs/27831394815 — AppImage (x86_64 + aarch64), Flatpak (x86_64 + aarch64), macOS .app (arm64 + x86_64), Windows portable .zip (arm64 + x64). Reviewers can grab the binary for their platform from the run's artifacts and exercise it against any amuled.
Frontend plan
A separate effort (not in this PR, not by me) will build a plain JS + CSS + HTML SPA that consumes this API. Two-phase rollout:
Development phase — host the SPA separately, CORS at amuleapi. Set
AllowCORS=1andCorsOriginAllowlist=https://your-app-hostinamuleapi.confand serve the SPA from any web server, includingpython -m http.serveragainst a build directory. The CORS bundle already echoesAccess-Control-Allow-Credentials: trueso cookie auth works cross-origin out of the box.Deployment phase — fold static-file serving into amuleapi. Add a
[Server]/StaticRoot=/path/to/distconfig key and a fallback handler inDispatchToHandlerthat resolves non-/api/v0paths against the configured root (with the path-traversal guardLooksMalicious(path)already applies, a MIME-type table, and SPA-history fallback toindex.html). The architecture already has the pieces:CHttpServer::Response.bodyis astd::string,content_typeis per-response. This is a future PR; the current scope is "REST + SSE surface ready to serve".Transitional period
Everything in this PR is in flight in the sense that the API surface is
/api/v0/, frozen against breaking changes for the lifetime of v0, but the architecture is up for discussion. If a better design appears mid-3.x, we can pivot — amuleapi is opt-in viaBUILD_AMULEAPI=YESand operators not running it are unaffected. The standalone-daemon shape minimises the cost of a rethink relative to what an amuleweb merger would have locked in.