SST Master Branch Merger: Auto Create Pull Request to Promote from devel to master - All Tests Ran Clean#2693
Merged
Merged
Conversation
Format into the caller-supplied buf and return it instead of leaking a new[]-allocated buffer; report required size via *len.
Use snprintf instead of unbounded strcpy to respect the caller buffer size.
Return NULL on missing addr or len pointer before dereferencing *len.
Fix sumi straddr buffer
…2687) The intra-node loopback delivery path truncated multi-segment sends, causing ProcessQueuesState::copyIoVec() to abort on assert(copied == len) when an Allgather-class collective runs with multiple ranks bound to a single endpoint NIC. Root cause ---------- For on-node peers, processSendLoop() packs the MatchHdr plus *every* data segment of the sender's I/O vector into a single vector (vec[0] = MatchHdr, vec[1..N] = data segments). The LoopReq constructor, however, kept only the first segment (vec[1]) and discarded vec[2..N], while the MatchHdr it carries still reports the *total* byte count across all segments. At delivery, copyIoVec() is asked to copy the full multi-segment length but only has the first segment as source, so copied stalls below len and the assertion fires (or, under NDEBUG, a short/incorrect receive buffer is produced silently). Multiple segments arise because Allgather::initIoVec() coalesces only physically contiguous chunks: a recursive-doubling stage whose send window wraps the modular buffer is split into two or more non-contiguous runs. This is why the failure reproduces reliably at high PPN (almost every neighbour is on-node) and even at 2 ranks/node with a large payload, but never for point-to-point or small single-segment collectives. Fix --- * LoopReq (ctrlMsgProcessQueuesState.h): copy every data segment (vec[1..N]) instead of only vec[1], so multi-segment intra-node sends are delivered intact. * copyIoVec (ctrlMsgProcessQueuesState.cc): harden as defence-in-depth. Bound the copy loops on `rV < dst.size()` in the loop conditions (instead of an assert that is compiled out under NDEBUG), and replace the bare assert(copied == len) with a dbg().fatal() that reports copied, len, src/dst segment counts and total byte counts. A mismatch is an internal defect between sender and receiver of the same collective algorithm, so it is surfaced loudly rather than silently truncated. Verified by reproducing the abort with the original code at numCores >= 8, then confirming clean completion after the fix across numCores = 2/8/16/56 and count = 1/131072 (Allgather). Fixes #2686
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request created to promote from devel branch to master due to successfully passing the following Jenkins Jobs :
JENKINS_SRN/SST__Nightly_OSX-15-XC16_OMPI-4.1.6_PY3.10_Mainline : Build 954
JENKINS_SRN/SST__Nightly_OSX-15-XC16_OMPI-4.1.6_PY3.10_Mainline_MR-2 : Build 778
JENKINS_SRN/SST__Nightly_OSX-15-XC16_OMPI-4.1.6_PY3.10_Mainline_MT-2 : Build 777
JENKINS_SRN/SST__Nightly_OSX-15-XC16_OMPI-4.1.6_PY3.10_Mainline_OutOfSource : Build 775
JENKINS_SRN/SST__Nightly_OSX-15-XC16_OMPI-4.1.6_PY3.10_SST-Macro_NoCore : Build 778
JENKINS_SRN/SST__Nightly_OSX-15-XC16_OMPI-4.1.6_PY3.10_SST-Macro_WithCore : Build 775
JENKINS_SRN/SST__Nightly_OSX-26-XC26_OMPI-4.1.4_PY3.10_Mainline : Build 1634
JENKINS_SRN/SST__Nightly_OSX-26-XC26_OMPI-4.1.4_PY3.10_Mainline_MR-2 : Build 1622
JENKINS_SRN/SST__Nightly_OSX-26-XC26_OMPI-4.1.4_PY3.10_Mainline_MT-2 : Build 1614
JENKINS_SRN/SST__Nightly_OSX-26-XC26_OMPI-4.1.4_PY3.10_Mainline_OutOfSource : Build 1615
JENKINS_SRN/SST__Nightly_OSX-26-XC26_OMPI-4.1.4_PY3.10_SST-Macro_NoCore : Build 1455
JENKINS_SRN/SST__Nightly_OSX-26-XC26_OMPI-4.1.4_PY3.10_SST-Macro_WithCore : Build 1485
JENKINS_SRN/SST__Nightly_sst-test_clang18_OMPI-NONE_PY3.13_Mainline : Build 434
JENKINS_SRN/SST__Nightly_sst-test_OMPI-4.1.4_PY3.9_Mainline : Build 1937
JENKINS_SRN/SST__Nightly_sst-test_OMPI-4.1.4_PY3.9_Mainline_memH-A_Sweep-1 : Build 1909
JENKINS_SRN/SST__Nightly_sst-test_OMPI-4.1.4_PY3.9_Mainline_memH-A_Sweep-2 : Build 1917
JENKINS_SRN/SST__Nightly_sst-test_OMPI-4.1.4_PY3.9_Mainline_memH-A_Sweep-3 : Build 1910
JENKINS_SRN/SST__Nightly_sst-test_OMPI-4.1.4_PY3.9_Mainline_memH-A_Sweep-4 : Build 1910
JENKINS_SRN/SST__Nightly_sst-test_OMPI-4.1.4_PY3.9_Mainline_MR-2 : Build 1910
JENKINS_SRN/SST__Nightly_sst-test_OMPI-4.1.4_PY3.9_Mainline_MT-2 : Build 1908
JENKINS_SRN/SST__Nightly_sst-test_OMPI-4.1.4_PY3.9_Mainline_MT-4 : Build 1908
JENKINS_SRN/SST__Nightly_sst-test_OMPI-4.1.4_PY3.9_Mainline_MT-2_MR-2 : Build 381
JENKINS_SRN/SST__Nightly_sst-test_OMPI-4.1.4_PY3.9_Mainline_OutOfSource : Build 1906
JENKINS_SRN/SST__Nightly_sst-test_OMPI-4.1.4_PY3.9_Make-Dist : Build 1913
JENKINS_SRN/SST__Nightly_sst-test_OMPI-4.1.4_PY3.9_SST-Macro_NoCore : Build 1910
JENKINS_SRN/SST__Nightly_sst-test_OMPI-4.1.4_PY3.9_SST-Macro_WithCore : Build 1912
JENKINS_SRN/SST__Nightly_sst-test_OMPI-4.1.4_PY3.9_SST_Macro_Make-Dist : Build 1916
JENKINS_SRN/SST__Nightly_sst-test_OMPI-NONE_PY3.9_Mainline : Build 1895
JENKINS_SRN/SST__Nightly_sst-test_OMPI-NONE_PY3.9_Mainline_MT-2 : Build 1914
JENKINS_SRN/SST__Nightly_Ubuntu-24.04_OMPI-4.1.6_PY3.12_Mainline : Build 832
JENKINS_SRN/SST__Nightly_Ubuntu-24.04_OMPI-4.1.6_PY3.12_Mainline_MR-2 : Build 763
JENKINS_SRN/SST__Nightly_Ubuntu-24.04_OMPI-4.1.6_PY3.12_Mainline_MT-2 : Build 776
JENKINS_SRN/SST__Nightly_Ubuntu-24.04_OMPI-4.1.6_PY3.12_Make-Dist : Build 777
JENKINS_SRN/SST__Nightly_Ubuntu-24.04_OMPI-4.1.6_PY3.12_SST-Macro_NoCore : Build 768
JENKINS_SRN/SST__Nightly_Ubuntu-24.04_OMPI-4.1.6_PY3.12_SST-Macro_WithCore : Build 763
JENKINS_SRN/SST__Nightly_Ubuntu-26.04_OMPI-5.0.10_PY3.14_Mainline : Build 90
JENKINS_SRN/SST__Nightly_Ubuntu-26.04_Doxygen : Build 65
JENKINS_SRN/SST__Nightly_TOSS_4.8_OMPI-4.1.6_PY3.12_Mainline : Build 803
JENKINS_SRN/SST__Nightly_Rocky-10_OMPI-5.0.2_PY3.12_Mainline : Build 261
JENKINS_SRN/SST__Nightly_Rocky-9_OMPI-4.1.6_PY3.9_Mainline : Build 832
JENKINS_SRN/SST__Nightly_COERHEL-9_OMPI-4.1.6_PY3.9_Mainline : Build 851