Adding recHit masking to extended pixel tracks in heterogeneous CA#51268
Adding recHit masking to extended pixel tracks in heterogeneous CA#51268borzari wants to merge 1 commit into
Conversation
|
cms-bot internal usage |
|
A new Pull Request was created by @borzari for master. It involves the following packages:
@AdrianoDee, @DickyChant, @Moanwar, @antoniovagnerini, @cmsbuild, @ctarricone, @davidlange6, @fabiocos, @ftenchini, @fwyzard, @gabrielmscampos, @jfernan2, @kfjack, @makortel, @mandrenguyen, @miquork, @rseidita, @srimanob, @sroychow can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
test parameters:
|
|
test parameters:
|
|
test parameters:
|
|
please test |
|
-1 Failed Tests: UnitTests RelVals RelVals-AMD_W7900 AddOn HLT P2 Timing: chart Failed Unit TestsI found 3 errors in the following unit tests: ---> test test_SpecialFullOutput had ERRORS ---> test test_HIonFullOutput had ERRORS ---> test test_GRunFullOutput had ERRORS Failed RelValsExpand to see more relval errors ...Failed RelVals-AMD_W7900
Failed AddOn Tests |
|
|
||
| pixelTracksLowPtAlpakaPhase2Extended.trackQualityCuts.minPt = cms.double(lowPtPtMinCut + 0.05) | ||
| pixelTracksLowPtAlpakaPhase2Extended.geometry.ptCuts = cms.vdouble( | ||
| lowPtPtMinCut, lowPtPtMinCut, lowPtPtMinCut, lowPtPtMinCut, lowPtPtMinCut, |
There was a problem hiding this comment.
if the values are all intentionally the same, perhaps [lowPtPtMinCut]*73
|
|
||
| using TrackingRecHitsMaskingCollection = std::conditional_t<std::is_same_v<Device, alpaka::DevCpu>, | ||
| ::reco::TrackingRecHitsMaskingHost, | ||
| ::reco::TrackingRecHitsMaskingDevice<Device>>; |
There was a problem hiding this comment.
@borzari can you move the new classes to their own headers files (e.g. DataFormats/TrackingRecHitSoA/interface/alpaka/TrackingRecHitsMaskingCollection.h), and add them to the serialisation plugins under DataFormats/TrackingRecHitSoA/plugins/ and DataFormats/TrackingRecHitSoA/plugins/alpaka/ ?
| ParamsOnDevice const* cpeParams, | ||
| Queue queue) const; | ||
| }; | ||
| class PixelRecHitMaskingKernel { |
There was a problem hiding this comment.
move to PixelRecHitMaskingKernel.h ?
There was a problem hiding this comment.
also, since the object does not have any state, could makeHitsMaskingAsync be a free function instead ?
| return hits_d; | ||
| } | ||
|
|
||
| TrackingRecHitsMaskingCollection PixelRecHitMaskingKernel::makeHitsMaskingAsync(uint32_t const nHits, |
There was a problem hiding this comment.
move to PixelRecHitMaskingKernel.dev.cc ?
| class LaunchZerosPixelMask { | ||
| public: | ||
| ALPAKA_FN_ACC void operator()(Acc1D const& acc, ::reco::TrackingRecHitsMaskingView mask) const { | ||
| for (uint32_t ic : cms::alpakatools::independent_group_elements(acc, mask.metadata().size())) { | ||
| assert(ic < (uint32_t)mask.metadata().size()); | ||
| mask[ic].recHitMask() = 0; | ||
| } | ||
| alpaka::syncBlockThreads(acc); | ||
| } | ||
| }; |
There was a problem hiding this comment.
this kernel could be replaced by alpaka::memset ?
| alpaka::memcpy(queue, | ||
| cms::alpakatools::make_device_view(queue, mask.view().recHitMask(), nHits), | ||
| cms::alpakatools::make_device_view(queue, mask_d.view().recHitMask(), nHits)); |
There was a problem hiding this comment.
Would it be equivalent to do
| alpaka::memcpy(queue, | |
| cms::alpakatools::make_device_view(queue, mask.view().recHitMask(), nHits), | |
| cms::alpakatools::make_device_view(queue, mask_d.view().recHitMask(), nHits)); | |
| alpaka::memcpy(queue, mask.buffer(), mask_d.buffer()); |
?
| Params m_params; | ||
| }; | ||
|
|
||
| class CAHitMaskingAndMerger { |
There was a problem hiding this comment.
Move to its own file, CAHitMaskingAndMerger.h ?
| MapToHit makeMaskingAsync(MapToHit const& mask_d, | ||
| TkSoADevice const& tracks_d, | ||
| const pixelTrack::Quality minQuality, | ||
| uint32_t const& iterationIndex, | ||
| Queue& queue) const; | ||
|
|
||
| void updateHitOffsets( | ||
| int const& tksBeg, int const& tksEnd, int const& nHits, TkSoADevice& tracks_d, Queue& queue) const; | ||
|
|
||
| TkSoADevice makeFilteredTracks(int const& nTracks, | ||
| int const& nHits, | ||
| TkSoADevice const& inpTracks, | ||
| pixelTrack::Quality const& minQuality, | ||
| double const& matchFraction, | ||
| Queue& queue) const; |
There was a problem hiding this comment.
In general we are trying to move to a syntax where the queue (or sometimes the device) is always the first argument.
Could you rearrange these functions' arguments?
| #endif | ||
|
|
||
| int threadsPerBlock = 128; | ||
| int blocks = int((trackd_view.metadata().size() + threadsPerBlock - 1) / threadsPerBlock); |
There was a problem hiding this comment.
cms::alpakatools::divide_up_by
|
@borzari as a general comment, can you split the new code into their own files ? |
|
|
||
| int threadsPerBlock = 128; | ||
| int blocks = inpTrack_view.metadata().size(); | ||
| const auto workDiv1D = cms::alpakatools::make_workdiv<Acc1D>(blocks, threadsPerBlock); |
There was a problem hiding this comment.
Do you really want to use one block per track, with 128 threads each ?
| const ::reco::TrackHitSoAConstView &inpTrackHit_view, | ||
| const pixelTrack::Quality minQuality, | ||
| const double matchFraction) const { | ||
| if (alpaka::getIdx<alpaka::Grid, alpaka::Threads>(acc)[0] == 0) { |
There was a problem hiding this comment.
| if (alpaka::getIdx<alpaka::Grid, alpaka::Threads>(acc)[0] == 0) { | |
| if (cms::alpakatools::once_per_grid(acc)) { |
unless you need it explicitly to be done by thread 0.
| continue; | ||
|
|
||
| bool hasDuplicate = false; | ||
| for (uint32_t j : cms::alpakatools::uniform_elements_x(acc, inpTrack_view.metadata().size())) { |
There was a problem hiding this comment.
This is a one-dimensional kernel (Acc1D), so uniform_elements_x can simply be uniform_elements.
However it does not make sense to use a uniform_elements[_x] loop inside the check at line 710, that restricts the execution to a single thread per grid.
| if (hasDuplicate) | ||
| break; | ||
| } | ||
| alpaka::syncBlockThreads(acc); |
There was a problem hiding this comment.
I think this is wrong - it's not possible to synchronise inside the check at line 720.
| } | ||
| }; | ||
|
|
||
| class Kernel_filterTracks { |
There was a problem hiding this comment.
IIUC the idea is to check all tracks vs all tracks.
I think it would be more efficient to use a 2D kernel with (for example) 16×16 threads per block, so that more of the memory is reused.
And since the comparison is symmetric whole blocks can quickly exit if they are in the lower (or upper) triangular part of the matrix.
Note that one should probably not launch the kernel with N_tracks × N_tracks total threads.
Find a number of blocks that gives a good occupancy, and let each block process more data. Even better if you can keep one of the two groups of tracks fixed, so they don't need to be re-read from memory.
|
|
||
| namespace ALPAKA_ACCELERATOR_NAMESPACE { | ||
|
|
||
| class PixelTracksMaskingSoA : public global::EDProducer<> { |
There was a problem hiding this comment.
PixelTracksMaskingSoAProducer ?
| for (const auto& it : inputTkSoAs) { | ||
| auto nTksAuxDev = it->view().tracks().metadata().size(); | ||
| auto nHitsAuxDev = it->view().trackHits().metadata().size(); | ||
|
|
||
| reco::TracksHost itHost(queue, nTksAuxDev, nHitsAuxDev); | ||
|
|
||
| alpaka::memcpy(queue, itHost.buffer(), it->buffer()); | ||
| alpaka::wait(queue); | ||
|
|
||
| int nTksAux = itHost.view().tracks().nTracks(); | ||
|
|
||
| nTks.push_back(nTksAux); | ||
|
|
||
| int nHitsAux = 0; | ||
| for (int i = 0; i < nTksAux; ++i) { | ||
| nHitsAux += ::reco::nHits(itHost.view().tracks(), i); | ||
| } |
There was a problem hiding this comment.
I'm not sure about what is being copied or why - but if there are more than one input collections, I think it would be better to first launch all copies, then synchronise only once.
| } | ||
| } // namespace | ||
|
|
||
| void PixelTracksSoAMerger::produce(edm::StreamID streamID, |
There was a problem hiding this comment.
can the bulk of the work be done directly on gpu ?
There was a problem hiding this comment.
All additions seem commented out.
Can you just drop the changes from this file ?
PR description:
This PR was co-authored by @AdrianoDee
This PR adds the possibility to mask recHits during doublets creation in the heterogeneous CA algorithm. This capability was already available at the serial-only CA used for the offline tracks reconstruction, and this step allows the production of extended pixel tracks on device using the heterogeneous CA algorithm for Phase-2 offline tracks reconstruction. This PR will serve as ground work for future developments. This PR should be tested with the implementations of PR#51085
This work was presented at the general TRK POG meeting on 09 June 2026. As a summary, the implementation includes the following:
For each tracking iteration, the masking works as follows:
This process can be iterated N times and, at the end, the SoAs merging module is executed to merge the N iterations. Finally, the merged SoA can go to the legacy conversion module.
PR validation:
Plots that compare the pixel tracks reconstruction baseline with two implementations of the two-iterations reconstruction are available here:
Two new folders for plots were added, mostly for validation, called
pixelHighPtandpixelLowPt. In the one-iteration case, the same plots that go in thePixel tracksare copied to those folders, while in the two-iteration cases, highpT/baseline plots go intopixelHighPtand lowpT plots go intopixelLowPt. The plots inPixel tracksfor the two-iteration cases consider the merged and converted pixelTracks SoA. When comparing baseline with baseline + lowpT it is possible to notice that, as expected, when including a low pT iteration, the performance in terms of reconstruction efficiency is increased for lower pT pixel tracks, while the fakes and duplicates rates is also increased, given that the second iteration still reconstructs copies of the first iteration tracks. This could be improved by a better duplicates removal algorithm in the SoA merger, or a better tracks selection algorithm, since the one applied in the pixel tracks reconstruction schedule is still based on track properties' cuts. In thelow pTfolder, it is also possible to notice that the masking is being applied properly, since there is a large drop in efficiency at ~0.9 GeV (~2.0 GeV), related to the recHits masked in the baseline (highpT) iteration that has those values as the pT requirements in the CA module.Timing performance plots were not produced using the iterations displayed above, but another two-iteration procedure that used masking for HLT tracking was shown at the same general TRK POG meeting on 09 June 2026, which shows that the inclusion of an extra iteration changes very little timing-wise, as can be seen in the plot below. A modification in one of the masking kernels was suggested by @Parsifal-2045, which greatly reduced the timing of the recHits masking execution.
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
Not a backport