Fix - Model Download Race Condition by VultureZZ · Pull Request #2305 · lemonade-sdk/lemonade

VultureZZ · 2026-06-18T14:24:27Z

This commit introduces a mechanism to ensure that only one download operation occurs at a time for each model. It adds the awaitExistingModelDownload function to wait for any in-progress downloads before initiating a new one, preventing potential conflicts and data corruption. Additionally, a mutex-based locking system is implemented in the ModelManager and HttpClient to manage concurrent download requests effectively. This enhancement improves the reliability of model downloads across multiple requests.

…ace conditions This commit introduces a mechanism to ensure that only one download operation occurs at a time for each model. It adds the `awaitExistingModelDownload` function to wait for any in-progress downloads before initiating a new one, preventing potential conflicts and data corruption. Additionally, a mutex-based locking system is implemented in the `ModelManager` and `HttpClient` to manage concurrent download requests effectively. This enhancement improves the reliability of model downloads across multiple requests.

fl0rianr

Thanks for working on this — the direction looks good, especially the path-level serialization in HttpClient and the per-model lock in ModelManager.

I think this still needs changes before merge, I noted them below.

Suggested fixes:

Make /load and collection component auto-downloads cache-first where appropriate, e.g. download_registered_model(info, true).
Use the same canonical model identity for server download job keys and frontend active-download matching.
Add a regression test for /pull + /load concurrency and, ideally, alias/canonical model names

fl0rianr · 2026-06-18T17:09:58Z

+    std::lock_guard<std::mutex> download_guard(*model_lock);
+
+    // Another caller may have finished while we waited for the model lock.
+    if (do_not_upgrade && is_model_downloaded(info.model_name)) {


This post-lock re-check only catches cache-first callers. /load still calls download_registered_model(info) with the default do_not_upgrade=false, so a /load request queued behind an in-flight /pull can wait on this mutex and then still proceed into the download/update path. Could we either make /load call download_registered_model(info, true) or make this guard explicitly skip when the model became downloaded while waiting?

This matters because handle_load() currently downloads missing models with download_registered_model(info) and no do_not_upgrade=true.

fl0rianr · 2026-06-18T17:10:53Z

+            auto operation = [this, model_name, request_json, do_not_upgrade](DownloadProgressCallback progress_cb) {
+                model_manager_->download_model(model_name, request_json, do_not_upgrade, progress_cb);
+            };
+            auto job = start_download_job("model:" + model_name, "model", model_name, operation);


This job key uses the raw request model_name, while the ModelManager download lock uses resolve_model_name(model_name). Alias vs canonical requests for the same logical model can therefore create separate server job IDs/UI rows even though they serialize lower down. Could we key the download job with the same resolved model name used by get_model_download_lock()?

fl0rianr · 2026-06-18T17:11:42Z

+
+  const serverDownloads = await downloadTracker.hydrateFromServer();
+  const active = serverDownloads.find(
+    item => item.model_name === modelName &&


Same alias/canonicalization issue on the client side: this only detects an existing server download if item.model_name exactly equals the caller’s modelName. If another caller started the same model through a different alias, ensureModelReady() can miss the active job and start another /pull. Could we normalize model names here or have the server snapshot expose a canonical model ID to compare against?

fl0rianr · 2026-06-18T17:12:26Z

    if (!isDownloaded) {
-      await pullModel(modelName, { declaredSizeGB: modelsData[modelName]?.size });
+      if (downloadTracker.isActive(modelName) ||
+          await downloadTracker.hasActiveServerDownload(modelName)) {


This pre-flight check has the same exact-match limitation as awaitExistingModelDownload(). If server-side dedupe is intended to be “one download per logical model”, the client check should use the same identity semantics as the server lock, not only the raw UI/request name.

fl0rianr · 2026-06-18T17:13:30Z

                                         const std::map<std::string, std::string>& headers,
                                         const DownloadOptions& options) {
+    auto path_lock = g_path_download_locks.acquire(output_path);
+    std::lock_guard<std::mutex> path_guard(*path_lock);


This path-level lock is a good last line of defense against .partial corruption. Given that the higher-level model/job dedupe can still miss alias/canonical cases, could we add a regression test that races two callers against the same output path and verifies that the partial file is not concurrently written/corrupted?

VultureZZ and others added 2 commits June 18, 2026 09:34

Merge branch 'main' into pr/download-sync

0c07d0d

github-actions Bot added the bug Something isn't working label Jun 18, 2026

fl0rianr requested changes Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix - Model Download Race Condition#2305

Fix - Model Download Race Condition#2305
VultureZZ wants to merge 2 commits into
lemonade-sdk:mainfrom
VultureZZ:pr/download-sync

VultureZZ commented Jun 18, 2026

Uh oh!

fl0rianr left a comment

Uh oh!

fl0rianr Jun 18, 2026

Uh oh!

fl0rianr Jun 18, 2026

Uh oh!

fl0rianr Jun 18, 2026

Uh oh!

fl0rianr Jun 18, 2026

Uh oh!

fl0rianr Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

VultureZZ commented Jun 18, 2026

Uh oh!

fl0rianr left a comment

Choose a reason for hiding this comment

Uh oh!

fl0rianr Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

fl0rianr Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

fl0rianr Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

fl0rianr Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

fl0rianr Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants