fix(core): implement true endgame mode with redundant requests to pre…#225
fix(core): implement true endgame mode with redundant requests to pre…#225Cflsft wants to merge 4 commits into
Conversation
|
Thanks for the follow-up — the diff is much tighter than #224 (4 files, no scope creep) and the symptom is real. But I think the proposed shape (a parallel endgame path with redundant requests) is the wrong tool for what aMule already has, and the implementation underneath has issues independent of that. My recommendation is to redirect this PR toward improving the existing mechanism rather than adding a new one. aMule already has the right shape — it's just narrowly triggeredThe "stuck at 99% on a slow source" case has a working remedy today, in
That's the correct strategy: rotate work away from a slow source onto a fast one when one becomes available, with explicit cancellation, zero bandwidth waste. Same end result your endgame mode is after, achieved without redundant requests. Failure modes that match the "stuck at 99%" symptom:
Each of those is a small, well-scoped tweak — relax DROP_FACTOR near completion, remove the pref gate when remaining < N parts, add a periodic stale-assignment scan, special-case "one chunk left + single source" to force a rotation when any new source for the file becomes known. Any one of those is a much safer change than a new redundant-request path. Why the current PR shape is also problematic on its own meritsEven if we ended up wanting endgame-style redundant requests, this implementation has three issues that would block it:
|
|
Correction / additional context to my earlier comment: I undersold how narrow that existing rotation path actually is in practice. That makes your point about the "stuck at 99%" symptom more valid than my earlier comment gave it credit for — and it also opens a much smaller, safer fix than either your endgame design or the per-knob tweaks I listed. Concrete suggestion for a follow-up PR — make
Concrete site to put the gate: bool nearCompletion =
m_reqfile && m_reqfile->GetPartCount() > 4 &&
m_reqfile->GetIncompletePartCount() <= 4;
if (thePrefs::GetDropSlowSources() || nearCompletion) {
slower_client = m_reqfile->GetSlowerDownloadingClient(m_lastaverage, this);
}( That gives you the symptom fix you're after, with no concurrency rework, no redundant requests, no bandwidth-claim wiring, and a clean small-file guard. ~3 lines of substantive change plus a helper. |
7e61b30 to
3a4d279
Compare
|
@Cflsft much better — the diff on the latest commit ( That said — this isn't mergeable on inspection alone. The reason: So before merging we need a test cycle from you that demonstrates:
Paste the relevant log excerpts as text in code fences (not screenshots), along with the aMule rev + platform you tested on. If everything checks out and we don't see latent issues surface, this is ready to land. |
730d6a0 to
3a4d279
Compare
|
@Cflsft this is a real improvement on
That said, the fix-3 branch is incomplete and slightly leaky: } else {
AddDebugLogLineN(logLocalClient,
"Local Client: OP_CANCELTRANSFER (freed blocks not available here) to " + GetFullIP());
slower_client = this;
slower_client->SetDownloadState(DS_NONEEDEDPARTS);
return;
}The log line says Suggested shape — actually send the cancel so the log matches reality: } else {
if (!GetSentCancelTransfer()) {
CPacket* packet = new CPacket(OP_CANCELTRANSFER, 0, OP_EDONKEYPROT);
theStats::AddUpOverheadFileRequest(packet->GetPacketSize());
ClearDownloadBlockRequests();
SendPacket(packet, true, true);
SetSentCancelTransfer(1);
}
AddDebugLogLineN(logLocalClient,
"Local Client: OP_CANCELTRANSFER (freed blocks not available here) to " + GetFullIP());
SetDownloadState(DS_NONEEDEDPARTS);
return;
}Two minor nits while you're in there:
Testing still required before merge. The previous review's test cycle still applies and matters more now that the diff activates the rotation path widely. Per my earlier comment, please run a Debug build with |
19dc62f to
9bb34be
Compare
|
Thanks for the review and the excellent suggestions! When testing the new nearCompletion trigger on a real download, the code activated the rotation path much more aggressively. This stress-test exposed two edge cases in the DropSlowSources logic, which we have now addressed in this PR:
The Issue: A Segfault occurred because calling TickDownloadAndMeasure() could trigger the rotation logic, which synchronously removes clients from m_downloadingSourcesList while the loop was still iterating over it. The Issue: The code could hit the wxFAIL_MSG("No free blocks to request after freeing some blocks"). This happened because forcing a slower client to free its blocks does not guarantee those blocks can be assigned to the faster client, specifically if the faster client does not have the Part where those freed blocks reside. Added the actual OP_CANCELTRANSFER packet dispatch in the graceful drop fallback. |
|
@Cflsft — the latest changes all landed correctly: the graceful-drop now actually dispatches Three things before this can land: 1. The test cycle needs to be shown, across several different runs. Your comment says it was verified, but no logs are attached — and since this activates a rotation path that's been effectively dead for years, "works perfectly" on a single download isn't enough. Please paste, as text in code fences, the 2. Drop the leftover comments. The graceful-drop branch still carries the commented-out original 3. This turns a years-dormant code path on for the entire user base — that's the heart of our caution, and it can't be overstated. Today |
9cf8006 to
5bdab54
Compare
|
Although I have already successfully downloaded quite a few files of various types and sizes with this patch, I completely agree with your caution regarding the blast radius of activating this globally. To properly address the concerns about heuristic and edge cases, I am going to test for the next few days to gather logs across a variety of downloads (different sizes, ≤4 parts, "all sources slow", etc.). Thanks for the meticulous review! |
|
Great, thank you for the contribution. 👍 |
This PR solves the notorious issue where a download gets stuck at 99% because the last remaining block is exclusively assigned to a very slow or dead source.
It implements a BitTorrent-style "Endgame Mode":
GetNextEmptyBlockInPartallows multiple available sources to request the same missing block concurrently.HasRequestedBlockensures that a single source doesn't redundantly request the same block from itself.IsComplete()) safely discard any incoming redundant data in memory. The core then automatically cancels the remaining redundant transfers viaOP_CANCELTRANSFER.This change guarantees that downloads finish smoothly without stalling, with negligible and strictly controlled bandwidth overhead.