Skip to content

Adding recHit masking to extended pixel tracks in heterogeneous CA#51268

Open
borzari wants to merge 1 commit into
cms-sw:masterfrom
borzari:hit_masking_201X_squashed
Open

Adding recHit masking to extended pixel tracks in heterogeneous CA#51268
borzari wants to merge 1 commit into
cms-sw:masterfrom
borzari:hit_masking_201X_squashed

Conversation

@borzari

@borzari borzari commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

PR description:

This PR was co-authored by @AdrianoDee

This PR adds the possibility to mask recHits during doublets creation in the heterogeneous CA algorithm. This capability was already available at the serial-only CA used for the offline tracks reconstruction, and this step allows the production of extended pixel tracks on device using the heterogeneous CA algorithm for Phase-2 offline tracks reconstruction. This PR will serve as ground work for future developments. This PR should be tested with the implementations of PR#51085

This work was presented at the general TRK POG meeting on 09 June 2026. As a summary, the implementation includes the following:

  • A new one-column SoA to store the values of the masks
    • The values are 0 (not masked) or greater than 0 (masked; each iteration can input a distinct value in the mask vector to know which iteration masked which hit)
  • A new module that performs the masking
  • Modification of the doublets producing kernel to skip masked input and output hits
  • A new module that merges the output SoAs from distinct iterations
    • All the steps up to the conversion of the pixel tracks to legacy can be done on device
    • There is a preliminary hits-based duplicates removal procedure included
  • An extra column in the pixel tracks SoA to save the iteration that produce a given track
    • Mostly for validation
  • A procModifier to turn on one extra iteration in extended pixel tracks reconstruction
  • A runTheMatrix offset to run the extended pixel tracks reconstruction with two iterations for validation

For each tracking iteration, the masking works as follows:

  • An initial all 0s masking SoA is created with size nHits, together with the hits creation
  • The CA algorithm is executed and produces pixelTracksIt1
  • The masking module is ran to produce a new masking vector based on the previous vector and the produced tracks
    • There is a simple track quality requirement to only select relevant tracks to be masked
  • The new masking collection is passed as input to another execution of the CA algorithm which will not consider the masked hits

This process can be iterated N times and, at the end, the SoAs merging module is executed to merge the N iterations. Finally, the merged SoA can go to the legacy conversion module.

PR validation:

Plots that compare the pixel tracks reconstruction baseline with two implementations of the two-iterations reconstruction are available here:

  • baseline: runs with masking disabled the regular extended pixel tracks reconstruction schedule
  • highpT + lowpT: both iterations are copies of baseline but with pT requirements during doublets and ntuplets building changed to 2.0 GeV and 0.5 GeV, respectively
  • baseline + lowpT: same as described above for the respective iterations; suggested at the TRK POG meeting

Two new folders for plots were added, mostly for validation, called pixelHighPt and pixelLowPt. In the one-iteration case, the same plots that go in the Pixel tracks are copied to those folders, while in the two-iteration cases, highpT/baseline plots go into pixelHighPt and lowpT plots go into pixelLowPt. The plots in Pixel tracks for the two-iteration cases consider the merged and converted pixelTracks SoA. When comparing baseline with baseline + lowpT it is possible to notice that, as expected, when including a low pT iteration, the performance in terms of reconstruction efficiency is increased for lower pT pixel tracks, while the fakes and duplicates rates is also increased, given that the second iteration still reconstructs copies of the first iteration tracks. This could be improved by a better duplicates removal algorithm in the SoA merger, or a better tracks selection algorithm, since the one applied in the pixel tracks reconstruction schedule is still based on track properties' cuts. In the low pT folder, it is also possible to notice that the masking is being applied properly, since there is a large drop in efficiency at ~0.9 GeV (~2.0 GeV), related to the recHits masked in the baseline (highpT) iteration that has those values as the pT requirements in the CA module.

Timing performance plots were not produced using the iterations displayed above, but another two-iteration procedure that used masking for HLT tracking was shown at the same general TRK POG meeting on 09 June 2026, which shows that the inclusion of an extra iteration changes very little timing-wise, as can be seen in the plot below. A modification in one of the masking kernels was suggested by @Parsifal-2045, which greatly reduced the timing of the recHits masking execution.

image

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

Not a backport

@cmsbuild

cmsbuild commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

cms-bot internal usage

@cmsbuild

Copy link
Copy Markdown
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-51268/49857

@cmsbuild

Copy link
Copy Markdown
Contributor

A new Pull Request was created by @borzari for master.

It involves the following packages:

  • Configuration/ProcessModifiers (operations)
  • Configuration/PyReleaseValidation (pdmv)
  • DataFormats/TrackSoA (heterogeneous, reconstruction)
  • DataFormats/TrackingRecHitSoA (heterogeneous, reconstruction)
  • RecoLocalTracker/SiPixelRecHits (reconstruction)
  • RecoTracker/PixelSeeding (reconstruction)
  • RecoTracker/PixelTrackFitting (reconstruction)
  • Validation/RecoTrack (dqm)

@AdrianoDee, @DickyChant, @Moanwar, @antoniovagnerini, @cmsbuild, @ctarricone, @davidlange6, @fabiocos, @ftenchini, @fwyzard, @gabrielmscampos, @jfernan2, @kfjack, @makortel, @mandrenguyen, @miquork, @rseidita, @srimanob, @sroychow can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @Martin-Grunewald, @VinInn, @VourMa, @dgulhan, @dkotlins, @elusian, @fabiocos, @felicepantaleo, @ferencek, @gpetruc, @makortel, @missirol, @mmasciov, @mmusich, @mroguljic, @mtosi, @rovere, @slomeo, @threus, @tsusa, @wmtford this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@AdrianoDee

Copy link
Copy Markdown
Contributor

test parameters:

@AdrianoDee

Copy link
Copy Markdown
Contributor

test parameters:

@AdrianoDee

Copy link
Copy Markdown
Contributor

test parameters:

@AdrianoDee

Copy link
Copy Markdown
Contributor

please test

@cmsbuild

Copy link
Copy Markdown
Contributor

-1

Failed Tests: UnitTests RelVals RelVals-AMD_W7900 AddOn
Size: This PR adds an extra 128KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c9dd2d/54115/summary.html
COMMIT: 837a5ad
CMSSW: CMSSW_20_1_X_2026-06-19-1100/el9_amd64_gcc13
Additional Tests: HLT_P2_TIMING,HLT_P2_INTEGRATION,GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/51268/54115/install.sh to create a dev area with all the needed externals and cmssw changes.

HLT P2 Timing: chart

Failed Unit Tests

I found 3 errors in the following unit tests:

---> test test_SpecialFullOutput had ERRORS
---> test test_HIonFullOutput had ERRORS
---> test test_GRunFullOutput had ERRORS

Failed RelVals

----- Begin Fatal Exception 19-Jun-2026 18:31:16 CEST-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 391699 lumi: 55 event: 65980774 stream: 0
   [1] Running path 'MC_PFScouting_v15'
   [2] Calling method for module CAHitNtupletAlpakaPhase1@alpaka/'hltPixelTracksSoA'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: PortableHostCollection<reco::TrackingRecHitsMaskingLayout<128,false> >
Looking for module label: hltPhase2PixelRecHitsExtendedSoA
Looking for productInstanceName: 

   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exception,
add "TryToContinue = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration.

----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 19-Jun-2026 18:31:43 CEST-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 393240 lumi: 150 event: 169546534 stream: 0
   [1] Running path 'MC_PFScouting_v15'
   [2] Calling method for module CAHitNtupletAlpakaPhase1@alpaka/'hltPixelTracksSoA'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: PortableHostCollection<reco::TrackingRecHitsMaskingLayout<128,false> >
Looking for module label: hltPhase2PixelRecHitsExtendedSoA
Looking for productInstanceName: 

   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exception,
add "TryToContinue = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration.

----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 19-Jun-2026 18:41:42 CEST-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0
   [1] Running path 'MC_PFScouting_v15'
   [2] Calling method for module CAHitNtupletAlpakaPhase1@alpaka/'hltPixelTracksSoA'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: PortableHostCollection<reco::TrackingRecHitsMaskingLayout<128,false> >
Looking for module label: hltPhase2PixelRecHitsExtendedSoA
Looking for productInstanceName: 

   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exception,
add "TryToContinue = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration.

----- End Fatal Exception -------------------------------------------------
Expand to see more relval errors ...

Failed RelVals-AMD_W7900

  • 34434.40434434.404_TTbar_14TeV+Run4D121_Patatrack_PixelOnlyAlpaka_Profiling/step3_TTbar_14TeV+Run4D121_Patatrack_PixelOnlyAlpaka_Profiling.log
  • 34434.40334434.403_TTbar_14TeV+Run4D121_Patatrack_PixelOnlyAlpaka_Validation/step3_TTbar_14TeV+Run4D121_Patatrack_PixelOnlyAlpaka_Validation.log
  • 34434.40234434.402_TTbar_14TeV+Run4D121_Patatrack_PixelOnlyAlpaka/step3_TTbar_14TeV+Run4D121_Patatrack_PixelOnlyAlpaka.log

Failed AddOn Tests

----- Begin Fatal Exception 19-Jun-2026 18:25:02 CEST-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 4 stream: 2
   [1] Running path 'HLT_L1HT200_QuadPFJet25_PNet1BTag0p50_PNet1Tauh0p50_v10'
   [2] Calling method for module CAHitNtupletAlpakaPhase1@alpaka/'hltPixelTracksSoA'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: PortableHostCollection<reco::TrackingRecHitsMaskingLayout<128,false> >
Looking for module label: hltPhase2PixelRecHitsExtendedSoA
Looking for productInstanceName: 

   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exception,
add "TryToContinue = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration.

----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 19-Jun-2026 18:26:16 CEST-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 2 stream: 2
   [1] Running path 'HLT_L1Topo_Mu12_PFHT50_HLTTopo0p98_v4'
   [2] Calling method for module CAHitNtupletAlpakaPhase1@alpaka/'hltPixelTracksSoA'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: PortableHostCollection<reco::TrackingRecHitsMaskingLayout<128,false> >
Looking for module label: hltPhase2PixelRecHitsExtendedSoA
Looking for productInstanceName: 

   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exception,
add "TryToContinue = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration.

----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 19-Jun-2026 18:24:50 CEST-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0
   [1] Running path 'HLT_HICentrality50100MinimumBiasHF1AND_Beamspot_v7'
   [2] Calling method for module CAHitNtupletAlpakaHIonPhase1@alpaka/'hltPixelTracksPPOnAASoA'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: PortableHostCollection<reco::TrackingRecHitsMaskingLayout<128,false> >
Looking for module label: hltPhase2PixelRecHitsExtendedSoA
Looking for productInstanceName: 

   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exception,
add "TryToContinue = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration.

----- End Fatal Exception -------------------------------------------------
Expand to see more addon errors ...


pixelTracksLowPtAlpakaPhase2Extended.trackQualityCuts.minPt = cms.double(lowPtPtMinCut + 0.05)
pixelTracksLowPtAlpakaPhase2Extended.geometry.ptCuts = cms.vdouble(
lowPtPtMinCut, lowPtPtMinCut, lowPtPtMinCut, lowPtPtMinCut, lowPtPtMinCut,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the values are all intentionally the same, perhaps [lowPtPtMinCut]*73

Comment on lines +26 to +29

using TrackingRecHitsMaskingCollection = std::conditional_t<std::is_same_v<Device, alpaka::DevCpu>,
::reco::TrackingRecHitsMaskingHost,
::reco::TrackingRecHitsMaskingDevice<Device>>;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@borzari can you move the new classes to their own headers files (e.g. DataFormats/TrackingRecHitSoA/interface/alpaka/TrackingRecHitsMaskingCollection.h), and add them to the serialisation plugins under DataFormats/TrackingRecHitSoA/plugins/ and DataFormats/TrackingRecHitSoA/plugins/alpaka/ ?

ParamsOnDevice const* cpeParams,
Queue queue) const;
};
class PixelRecHitMaskingKernel {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to PixelRecHitMaskingKernel.h ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, since the object does not have any state, could makeHitsMaskingAsync be a free function instead ?

return hits_d;
}

TrackingRecHitsMaskingCollection PixelRecHitMaskingKernel::makeHitsMaskingAsync(uint32_t const nHits,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to PixelRecHitMaskingKernel.dev.cc ?

Comment on lines +223 to +232
class LaunchZerosPixelMask {
public:
ALPAKA_FN_ACC void operator()(Acc1D const& acc, ::reco::TrackingRecHitsMaskingView mask) const {
for (uint32_t ic : cms::alpakatools::independent_group_elements(acc, mask.metadata().size())) {
assert(ic < (uint32_t)mask.metadata().size());
mask[ic].recHitMask() = 0;
}
alpaka::syncBlockThreads(acc);
}
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this kernel could be replaced by alpaka::memset ?

Comment on lines +454 to +456
alpaka::memcpy(queue,
cms::alpakatools::make_device_view(queue, mask.view().recHitMask(), nHits),
cms::alpakatools::make_device_view(queue, mask_d.view().recHitMask(), nHits));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be equivalent to do

Suggested change
alpaka::memcpy(queue,
cms::alpakatools::make_device_view(queue, mask.view().recHitMask(), nHits),
cms::alpakatools::make_device_view(queue, mask_d.view().recHitMask(), nHits));
alpaka::memcpy(queue, mask.buffer(), mask_d.buffer());

?

Params m_params;
};

class CAHitMaskingAndMerger {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to its own file, CAHitMaskingAndMerger.h ?

Comment on lines +88 to +102
MapToHit makeMaskingAsync(MapToHit const& mask_d,
TkSoADevice const& tracks_d,
const pixelTrack::Quality minQuality,
uint32_t const& iterationIndex,
Queue& queue) const;

void updateHitOffsets(
int const& tksBeg, int const& tksEnd, int const& nHits, TkSoADevice& tracks_d, Queue& queue) const;

TkSoADevice makeFilteredTracks(int const& nTracks,
int const& nHits,
TkSoADevice const& inpTracks,
pixelTrack::Quality const& minQuality,
double const& matchFraction,
Queue& queue) const;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we are trying to move to a syntax where the queue (or sometimes the device) is always the first argument.
Could you rearrange these functions' arguments?

#endif

int threadsPerBlock = 128;
int blocks = int((trackd_view.metadata().size() + threadsPerBlock - 1) / threadsPerBlock);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cms::alpakatools::divide_up_by

@fwyzard

fwyzard commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@borzari as a general comment, can you split the new code into their own files ?
In general having each kernel and wrapper in their own file keeps the code more organised (CMS coding rules mandate the file name matches the main class or function defined there) and should help with the compilation speed (different files are compiled in parallel, reducing the queue from the longer compilations).


int threadsPerBlock = 128;
int blocks = inpTrack_view.metadata().size();
const auto workDiv1D = cms::alpakatools::make_workdiv<Acc1D>(blocks, threadsPerBlock);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to use one block per track, with 128 threads each ?

const ::reco::TrackHitSoAConstView &inpTrackHit_view,
const pixelTrack::Quality minQuality,
const double matchFraction) const {
if (alpaka::getIdx<alpaka::Grid, alpaka::Threads>(acc)[0] == 0) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (alpaka::getIdx<alpaka::Grid, alpaka::Threads>(acc)[0] == 0) {
if (cms::alpakatools::once_per_grid(acc)) {

unless you need it explicitly to be done by thread 0.

continue;

bool hasDuplicate = false;
for (uint32_t j : cms::alpakatools::uniform_elements_x(acc, inpTrack_view.metadata().size())) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a one-dimensional kernel (Acc1D), so uniform_elements_x can simply be uniform_elements.

However it does not make sense to use a uniform_elements[_x] loop inside the check at line 710, that restricts the execution to a single thread per grid.

if (hasDuplicate)
break;
}
alpaka::syncBlockThreads(acc);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong - it's not possible to synchronise inside the check at line 720.

}
};

class Kernel_filterTracks {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the idea is to check all tracks vs all tracks.

I think it would be more efficient to use a 2D kernel with (for example) 16×16 threads per block, so that more of the memory is reused.

And since the comparison is symmetric whole blocks can quickly exit if they are in the lower (or upper) triangular part of the matrix.

Note that one should probably not launch the kernel with N_tracks × N_tracks total threads.
Find a number of blocks that gives a good occupancy, and let each block process more data. Even better if you can keep one of the two groups of tracks fixed, so they don't need to be re-read from memory.


namespace ALPAKA_ACCELERATOR_NAMESPACE {

class PixelTracksMaskingSoA : public global::EDProducer<> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PixelTracksMaskingSoAProducer ?

Comment on lines +119 to +135
for (const auto& it : inputTkSoAs) {
auto nTksAuxDev = it->view().tracks().metadata().size();
auto nHitsAuxDev = it->view().trackHits().metadata().size();

reco::TracksHost itHost(queue, nTksAuxDev, nHitsAuxDev);

alpaka::memcpy(queue, itHost.buffer(), it->buffer());
alpaka::wait(queue);

int nTksAux = itHost.view().tracks().nTracks();

nTks.push_back(nTksAux);

int nHitsAux = 0;
for (int i = 0; i < nTksAux; ++i) {
nHitsAux += ::reco::nHits(itHost.view().tracks(), i);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about what is being copied or why - but if there are more than one input collections, I think it would be better to first launch all copies, then synchronise only once.

}
} // namespace

void PixelTracksSoAMerger::produce(edm::StreamID streamID,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the bulk of the work be done directly on gpu ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All additions seem commented out.
Can you just drop the changes from this file ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants