Skip to content

[debug-tools] Enable multi-arch CI#5977

Draft
lumachad wants to merge 8 commits into
mainfrom
users/lumachad/rocgdb/multi-arch-ci
Draft

[debug-tools] Enable multi-arch CI#5977
lumachad wants to merge 8 commits into
mainfrom
users/lumachad/rocgdb/multi-arch-ci

Conversation

@lumachad

Copy link
Copy Markdown
Contributor

Enable ROCgdb multi-arch CI cross-triggers. Fix bugs and add unit tests.

lumachad and others added 4 commits June 19, 2026 05:26
The REPO_CONFIGS dictionary uses lowercase keys (rocm-libraries,
rocm-systems). When the repo name is extracted from the full repository
string (e.g. "ROCm/ROCgdb"), the split on "/" yields the original
casing, causing lookup failures for mixed-case repo names.

Lowercase the extracted name so the lookup is case-insensitive.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Add rocgdb to REPO_CONFIGS in detect_external_repo_config.py so it can
be used as an external repository in TheRock's multi-arch CI pipeline.

  cmake_source_var: THEROCK_ROCGDB_SOURCE_DIR
  submodule_path:   debug-tools/rocgdb/source

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Allow external repositories to pass additional CMake flags into TheRock
builds via the external_repo JSON input, without requiring TheRock to
hardcode repo-specific flags.

detect_external_repo_config.py reads extra_cmake_options from the
incoming --external-repo-json argument (alongside the existing projects
field) and forwards it through config_json to the build workflows.

The three artifact build workflows consume the new field by appending it
to the cmake invocation immediately after the _SOURCE_DIR flag:

  multi_arch_build_portable_linux_artifacts.yml
  multi_arch_build_windows_artifacts.yml
  multi_arch_build_wsl_rocdxg_artifacts.yml

External repos that need no extra flags omit the field (or pass an
empty string) and the cmake line expands to nothing, preserving full
backwards compatibility.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Cover the three fixes added to detect_external_repo_config.py:

- test_rocgdb_config: verifies the rocgdb REPO_CONFIGS entry has the
  correct cmake_source_var, submodule_path, and skip_submodules.

- test_external_repo_json_mixed_case_name: verifies that a full repo
  name like "ROCm/ROCgdb" is lowercased before the REPO_CONFIGS lookup,
  producing checkout_path "external-rocgdb" and resolving the correct
  cmake_source_var.

- test_extra_cmake_options_forwarded: verifies that extra_cmake_options
  supplied in the external_repo JSON is forwarded verbatim into
  config_json.

- test_extra_cmake_options_empty_by_default: verifies that omitting
  extra_cmake_options results in an empty string in config_json,
  preserving backwards compatibility.

- test_extra_cmake_options_multiple_flags: verifies that multiple
  space-separated flags are forwarded intact through the JSON
  parse/serialize round-trip.

- test_extra_cmake_options_embedded_quotes: verifies that embedded
  double quotes survive the JSON round-trip correctly; json.loads
  unescapes \" to ", then json.dumps re-escapes it back.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
@lumachad lumachad added the ci:skip Skip all CI builds/tests for this PR label Jun 19, 2026
@lumachad

Copy link
Copy Markdown
Contributor Author

ci:skip as this is still under validation against a ROCgdb counterpart: ROCm/ROCgdb#179

@lumachad lumachad force-pushed the users/lumachad/rocgdb/multi-arch-ci branch from 28b7ee4 to 65fae39 Compare June 19, 2026 19:09
The test matrix in fetch_test_configurations.py uses rocgdb-cpu and
rocgdb-gpu as job keys (to distinguish CPU-only vs GPU tests), but the
TEST_SUBPROJECTS blocks for rocgdb and amd-dbgapi were not aligned.
This caused determine_rocm_test_dependencies.py to return names that
never matched the actual test job keys — so no rocgdb tests would run
when filtering by project.

- rocgdb: add rocgdb-cpu and rocgdb-gpu to TEST_SUBPROJECTS
- amd-dbgapi: replace rocgdb with rocgdb-cpu and rocgdb-gpu (expansion
  is non-recursive, so rocgdb alone would not resolve further)

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
@lumachad lumachad force-pushed the users/lumachad/rocgdb/multi-arch-ci branch from 97f6cb5 to c94e568 Compare June 19, 2026 23:25
lumachad and others added 3 commits June 19, 2026 18:34
When an external repo (e.g. ROCgdb) passes extra_cmake_options such as
-DTHEROCK_USE_EXTERNAL_ROCGDB=ON, those flags were previously injected
into every stage's configure step — even stages that have no knowledge of
the external component.

This commit scopes extra_cmake_options to the stages that actually own the
submodule, and skips stages whose artifacts are not needed at all.

stage_impact.py:
  - Add StageImpactAnalyzer.required_stages_for_component(submodule_name)
    that walks upstream: owning stage(s) + every stage whose artifacts
    they depend on transitively. Uses topology.get_source_set_for_submodule()
    and get_source_set_to_stages() directly; no new build_topology.py
    methods needed.

detect_external_repo_config.py:
  - Add _derive_build_stages(skip_submodules) which calls
    required_stages_for_component() for each skipped submodule and
    deduplicates. Populates build_stages (comma-separated) in config_json.
    Empty string means no restriction (all stages run).
  - Forward skip_packaging from external_repo JSON into config_json so
    downstream steps can suppress packaging jobs without extra lookups.
  - Add build_tools/ to sys.path so _therock_utils is importable when the
    script is invoked as `python build_tools/github_actions/...` from root.

Build workflows:
  - multi_arch_build_portable_linux_artifacts.yml,
    multi_arch_build_windows_artifacts.yml,
    multi_arch_build_wsl_rocdxg_artifacts.yml: gate extra_cmake_options on
    build_stages — flags are only injected when build_stages is empty or
    lists the current stage_name.
  - multi_arch_build_portable_linux.yml: skip each non-compiler-runtime stage
    when build_stages is non-empty and does not include that stage.
    compiler-runtime is always built since every stage depends on it.

Example: ROCgdb derives ["compiler-runtime", "debug-tools"], so only those
two stages run and only debug-tools receives -DTHEROCK_USE_EXTERNAL_ROCGDB=ON.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
When a component like ROCgdb triggers multi-arch CI, DEB/RPM packages,
Python wheels, and PyTorch wheels are unnecessary overhead — only the
build and test stages matter.

External repos opt in by setting "skip_packaging": true in their
external_repo JSON. detect_external_repo_config.py forwards this field
into external_repo_config (commit in previous change). Here we consume it:

configure_multi_arch_ci.py:
  - Add build_python_packages field to LinuxBuildConfig (dataclass).
  - Read skip_packaging from EXTERNAL_REPO_JSON env var (the raw caller
    JSON, available at configure time before the detect step runs) and
    disable build_native_linux, build_python_packages, and build_pytorch
    when true.

setup_multi_arch.yml:
  - Pass EXTERNAL_REPO_JSON: ${{ inputs.external_repo }} to the configure
    step so configure_multi_arch_ci.py can read it directly.

multi_arch_ci_linux.yml, multi_arch_ci_windows.yml:
  - Guard build_python_packages job: add
    fromJSON(inputs.build_config).build_python_packages == true to if:.
  - Guard test_python_packages_per_family with the same condition.
    Without this guard, a skipped build job still satisfied
    !failure() && !cancelled(), causing the test job to run spuriously.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Document the external repo integration feature for TheRock's multi-arch CI:

- external_repo JSON format and all supported fields
- Stage scoping: how build_stages is derived from BUILD_TOPOLOGY.toml
  via StageImpactAnalyzer.required_stages_for_component(), and how it
  controls which stages run and which stages receive extra_cmake_options
- Packaging suppression: skip_packaging: true opt-in
- Step-by-step guide for adding a new external repo (REPO_CONFIGS entry,
  CMake variable wiring, caller workflow setup)
- Reference links to all relevant source files

Cross-link from ci_overview.md Infrastructure section.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
@lumachad lumachad force-pushed the users/lumachad/rocgdb/multi-arch-ci branch from c94e568 to 9b9a312 Compare June 19, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:skip Skip all CI builds/tests for this PR

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

1 participant