Skip to content

Avoid race condition when conditionally starting Lumi begin stream#51280

Open
Dr15Jones wants to merge 4 commits into
cms-sw:masterfrom
Dr15Jones:fixLumi
Open

Avoid race condition when conditionally starting Lumi begin stream#51280
Dr15Jones wants to merge 4 commits into
cms-sw:masterfrom
Dr15Jones:fixLumi

Conversation

@Dr15Jones

@Dr15Jones Dr15Jones commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

PR description:

Problem was the atomic could be incremented after it was already decremented to 0 once. This lead to the endLumiAsync being called twice for the same Lumi.

This fixes a problem seen in the IBs.

PR validation:

I had a small test program that tended to fail after 10s of 1000s of Lumis. With this changes, I have several times run a job which processed more than a million Lumis without issue.

resolves cms-sw/framework-team#2316
resolves #51265

@Dr15Jones

Copy link
Copy Markdown
Contributor Author

please test

@cmsbuild

cmsbuild commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

cms-bot internal usage

@cmsbuild

Copy link
Copy Markdown
Contributor

@cmsbuild

Copy link
Copy Markdown
Contributor

A new Pull Request was created by @Dr15Jones for master.

It involves the following packages:

  • FWCore/Framework (core)

@Dr15Jones, @makortel, @smuzaffar can you please review it and eventually sign? Thanks.
@makortel, @wddgit this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

Comment thread FWCore/Framework/src/EventProcessor.cc Outdated
Problem was the atomic could be incremented after it was already
decremented to 0 once. This lead to the endLumiAsync being called
twice for the same Lumi.
@Dr15Jones

Copy link
Copy Markdown
Contributor Author

please test

@cmsbuild

Copy link
Copy Markdown
Contributor

@cmsbuild

Copy link
Copy Markdown
Contributor

Pull request #51280 was updated. @Dr15Jones, @makortel, @smuzaffar can you please check and sign again.

@makortel

Copy link
Copy Markdown
Contributor

Looks good to me

@cmsbuild

Copy link
Copy Markdown
Contributor

+1

Size: This PR adds an extra 44KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7871f7/54183/summary.html
COMMIT: c6c416a
CMSSW: CMSSW_20_1_X_2026-06-22-1100/el9_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/51280/54183/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 45
  • DQMHistoTests: Total histograms compared: 3414477
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3414459
  • DQMHistoTests: Total skipped: 18
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 44 files compared)
  • Checked 195 log files, 163 edm output root files, 45 DQM output files

@cmsbuild

Copy link
Copy Markdown
Contributor

@cmsbuild

Copy link
Copy Markdown
Contributor

Pull request #51280 was updated. @Dr15Jones, @makortel, @smuzaffar can you please check and sign again.

@cmsbuild

Copy link
Copy Markdown
Contributor

+1

Size: This PR adds an extra 28KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7871f7/54257/summary.html
COMMIT: 17a56de
CMSSW: CMSSW_20_1_X_2026-06-24-1100/el9_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/51280/54257/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 1 lines from the logs
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 45
  • DQMHistoTests: Total histograms compared: 3414477
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3414459
  • DQMHistoTests: Total skipped: 18
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 44 files compared)
  • Checked 195 log files, 163 edm output root files, 45 DQM output files
  • TriggerResults: no differences found

- guarantee that at least 1 stream processes a Lumi even if the Lumi has no Events
@cmsbuild

Copy link
Copy Markdown
Contributor

@cmsbuild

Copy link
Copy Markdown
Contributor

Pull request #51280 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again.

@Dr15Jones

Copy link
Copy Markdown
Contributor Author

please test

The test holds an Event from the first Lumi until the Event from
the 3rd Lumi is started. This forces the system to be able to skip
the streamBeginLumi for the 2nd Lumi.
@cmsbuild

Copy link
Copy Markdown
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-51280/49943

@cmsbuild

Copy link
Copy Markdown
Contributor

Pull request #51280 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix race condition when starting Lumi begin stream Crash in edm::EventProcessor::clearLumiPrincipal()

4 participants