Skip to content

DM-55232: fix zstd failure in aggregate-graph due to huge logs#565

Merged
TallJimbo merged 2 commits into
mainfrom
tickets/DM-55232
Jun 23, 2026
Merged

DM-55232: fix zstd failure in aggregate-graph due to huge logs#565
TallJimbo merged 2 commits into
mainfrom
tickets/DM-55232

Conversation

@TallJimbo

@TallJimbo TallJimbo commented Jun 17, 2026

Copy link
Copy Markdown
Member

Checklist

  • ran Jenkins
  • ran and inspected package-docs build
  • added a release note for user-visible changes to doc/changes

@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 42.42424% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.67%. Comparing base (6dcf094) to head (ad03ba2).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...lsst/pipe/base/quantum_graph/aggregator/_writer.py 38.70% 17 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #565      +/-   ##
==========================================
- Coverage   88.73%   88.67%   -0.07%     
==========================================
  Files         160      160              
  Lines       22120    22151      +31     
  Branches     2625     2627       +2     
==========================================
+ Hits        19629    19642      +13     
- Misses       1849     1866      +17     
- Partials      642      643       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

@MichelleGower MichelleGower left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments. Merge approved.

break
if training_sample_size >= self.comms.config.zstd_dict_input_max_bytes:
self.comms.log.warning(
"Truncating compression dict training sample after %d predicted quanta.",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it would be quicker to understand if instead of saying "Truncating" saying something like "Reached compression dict training sample limit after..."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you want a similar warning with the above break for counts?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also reworked the logic break before exceeding the limit, rather than just after.

pred_quanta_sum = sum(len(block) for block in training_inputs[:n_pred_quanta])
prov_quanta_sum = sum(len(block) for block in training_inputs[n_pred_quanta::3])
metadata_sum = sum(len(block) for block in training_inputs[n_pred_quanta + 1 :: 3])
logs_sum = sum(len(block) for block in training_inputs[n_pred_quanta + 2 :: 3])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I possibly would set the step (3) in a variable to make sure future changes get them all. But the lines are all here together so...

If it is easy to write a unit test to catch if we add/remove something to/from the training_inputs but forget to change the step, that could avoid trying to track down a bug in the exception handling code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case should be rare enough and the test looks tricky enough that I'm going to skip it.

@TallJimbo TallJimbo merged commit 683f0f6 into main Jun 23, 2026
19 of 21 checks passed
@TallJimbo TallJimbo deleted the tickets/DM-55232 branch June 23, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants