Skip to content

Reduce the memory highwatermark in DistributedClosestPoint::computeClosestPoints#1889

Open
MrBurmark wants to merge 10 commits into
developfrom
bugfix/burmark1/DistributedClosestPointSendCompletionCleanUp
Open

Reduce the memory highwatermark in DistributedClosestPoint::computeClosestPoints#1889
MrBurmark wants to merge 10 commits into
developfrom
bugfix/burmark1/DistributedClosestPointSendCompletionCleanUp

Conversation

@MrBurmark

@MrBurmark MrBurmark commented Jun 18, 2026

Copy link
Copy Markdown
Member

Reduce the memory highwatermark in DistributedClosestPoint::computeClosestPoints. In DistributedClosestPoint::computeClosestPoints cleanup conduit nodes as soon as they are no longer needed instead of waiting until the end of the routine.
Also refactor storage to use unique_ptr instead of shared_ptr.

Summary

  • This PR is a refactoring/bugfix
  • It does the following (modify list as needed):
    • Modifies/refactors DistributedClosestPoint::computeClosestPoints to use unique_ptr
    • Fixes an issue where conduit Nodes lived longer than necessary causing increased memory usage

In DistributedClosestPoint::computeClosestPoints cleanup
conduit nodes as soon as they are no longer needed instead of waiting
until the end of the routine.
Also refactor storage to use unique_ptr instead of shared_ptr.
@MrBurmark MrBurmark requested review from bmhan12 and kennyweiss June 18, 2026 18:44
@MrBurmark MrBurmark added the Quest Issues related to Axom's 'quest' component label Jun 18, 2026
@rhornung67

Copy link
Copy Markdown
Member

@MrBurmark thanks for this. i ran clang-format on your branch so the CI checks will run

@MrBurmark MrBurmark changed the title Free conduit nodes earlier in DistributedClosestPoint Reduce the memory highwatermark in DistributedClosestPoint::computeClosestPoints Jun 18, 2026
Comment on lines 485 to 490
std::vector<MPI_Request> reqs;
for(auto& isr : isendRequests)
reqs.reserve(isendRequests.size());
for(auto const& isr : isendRequests)
{
reqs.push_back(isr.m_request);
reqs.push_back(isr.first.m_request);
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocating and freeing a vector of requests every time this function is called seems not optimal, but I didn't want to change too much at once.

The Request object holds onto the packed data so
we don't need to hold onto the original node while
the isends are processing.
…eanUp' of github.com:llnl/axom into bugfix/burmark1/DistributedClosestPointSendCompletionCleanUp
@MrBurmark

Copy link
Copy Markdown
Member Author

I has codex take a look and it pointed out that the requests own their own packed buffers. So I am now removing the nodes even earlier.

@edponce edponce left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are some general observations on how check_send_requests() is implemented and used which can be discussed with another PR. Ideally, the argument isendRequests would be a std::vector that can be used directly in MPI_[Wait|Test]some(). When using these MPI functions, there is no need to resize the requests array and inCount can remain invariant, but they can change if needed. The completed requests are nullified in the input requests array and ignored when used again. Another observation, the MPI standard defines MPI_Request as an opaque handle, and it is not recommended to consider it a copyable datatype although in many MPI implementations it is implemented as an integer.

Also, when waiting to complete all remaining non-blocking sends, these can be handled with MPI_Waitall() instead of using a while loop and invoking check_send_requests() multiple times.

@publixsubfan

Copy link
Copy Markdown
Contributor

@MrBurmark - are the host-side memory allocations that aren't made via Umpire the primary concern/target of these code changes?

@MrBurmark

MrBurmark commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

@MrBurmark - are the host-side memory allocations that aren't made via Umpire the primary concern/target of these code changes?

@publixsubfan Yes, the memory in the conduit nodes and requests is my main concern here. In this PR I reduce the lifetime of the nodes, but did not change the lifetime of the requests.
I would like to pass an allocator to the conduit nodes and requests so we can use umpire for pools and counting, but I did not see an existing mechanism in axom for passing in allocators for different use-cases just the allocator that the parallel computation uses.

@MrBurmark

Copy link
Copy Markdown
Member Author

These are some general observations on how check_send_requests() is implemented and used which can be discussed with another PR. Ideally, the argument isendRequests would be a std::vector that can be used directly in MPI_[Wait|Test]some(). When using these MPI functions, there is no need to resize the requests array and inCount can remain invariant, but they can change if needed. The completed requests are nullified in the input requests array and ignored when used again. Another observation, the MPI standard defines MPI_Request as an opaque handle, and it is not recommended to consider it a copyable datatype although in many MPI implementations it is implemented as an integer.

Also, when waiting to complete all remaining non-blocking sends, these can be handled with MPI_Waitall() instead of using a while loop and invoking check_send_requests() multiple times.

Given the request abstraction, I'm not sure that will be an easy thing to do in general. Perhaps if there was a abstraction for collections of requests?

@rhornung67

Copy link
Copy Markdown
Member

@publixsubfan I have been working through different approaches at making the Axom host allocation interface more consistent and flexible. You may recall that I put up a couple of PRs recently looking for comments and feedback. After discussions with several folks, I closed those and went back to the drawing board.

I am starting to work through a new PR where all host allocations will require an explicit allocation mechanism to be provided (e.g., Axom malloc, Umpire Host, or something else such as Umpire Pinned). Should we discuss this ASAP, or would it be better to talk about it when I have something close to done? I'm hoping to have a good draft by early next week.

@MrBurmark

Copy link
Copy Markdown
Member Author

@publixsubfan I have been working through different approaches at making the Axom host allocation interface more consistent and flexible. You may recall that I put up a couple of PRs recently looking for comments and feedback. After discussions with several folks, I closed those and went back to the drawing board.

I am starting to work through a new PR where all host allocations will require an explicit allocation mechanism to be provided (e.g., Axom malloc, Umpire Host, or something else such as Umpire Pinned). Should we discuss this ASAP, or would it be better to talk about it when I have something close to done? I'm hoping to have a good draft by early next week.

@rhornung67 I would be interested in talking earlier rather than later.

@publixsubfan

publixsubfan commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@rhornung67 I would be interested in talking earlier rather than later.

I’ll be back around next week if you guys want to take this offline. That being said, presuming it’s just the changes here I don’t see anything too controversial in this PR, just that long-term a broader solution might be along the lines that @rhornung67 is proposing with a new host memory interface.

@rhornung67

Copy link
Copy Markdown
Member

@MrBurmark I merged Axom develop into your PR branch and ran clang-format on it. Now all Axom tests pass.

We should probably figure out what we are not testing in your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Quest Issues related to Axom's 'quest' component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants