Skip to content

Commit 6e41a6d

Browse files
authored
Add Bazel PyPI manifest extraction (#1324)
This PR makes Bazel manifest creation Python-aware. This builds on the Maven Bazel work from [#1312](#1312), which closes an inline-declaration gap that exists in `rules_jvm_external`: Bazel can resolve Maven artifacts that do not exist in a checked-in Maven manifest. Python is different. `rules_python` commonly resolves packages from a checked-in pinned requirements or lock file and exposes those packages as Bazel labels. It works like this: a Bazel Python rule points to a checked-in requirements file. Bazel reads that file and makes the declared packages available as dependencies in the configured pip hub. Future Bazel build targets can then directly declare dependencies on those Python packages. What this PR does is emit a generated `requirements.txt` that contains only the pinned Python packages reachable from Bazel Python rules. It does not mutate or remove entries from the user's checked-in requirements file. The value is scoping the generated manifest to Bazel's reached package set instead of assuming every checked-in requirement is used by Bazel Python targets. This functionality does not kick in automatically, since I'm not fully convinced it won't cause more harm than good or cause confusion. It has to be manually enabled with `socket manifest bazel --ecosystem pypi`. `socket scan create --auto-manifest` continues to generate Bazel Maven manifests only. ## Summary of changes - add `socket manifest bazel --ecosystem pypi` support for whole-repo Bazel PyPI `requirements.txt` generation - discover rules_python pip hubs via Bazel command output first, with bounded static fallback paths - keep Bazel PyPI generation explicit; `socket scan create --auto-manifest` continues to generate Bazel Maven only - add bounded verbose diagnostics for Bazel subprocess, discovery, extraction, and empty-result triage - document the new command surface and add exact constructed-fixture oracle coverage
1 parent d1c99be commit 6e41a6d

21 files changed

Lines changed: 3807 additions & 68 deletions

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
77
## [Unreleased]
88
- **`socket manifest bazel [beta]`** — Generate Bazel JVM SBOM manifests by running `bazel query` against discovered Maven repos in a Bazel workspace. Closes the inline-Maven-declaration gap that lockfile-only parsing misses for repos like envoy, ray, tensorflow, tink-java, and or-tools. Auto-detects Bzlmod and legacy `WORKSPACE`.
99
- **`socket scan create --auto-manifest`** now covers Bazel workspaces in addition to Gradle/Scala/Kotlin/Conda. Repos with `MODULE.bazel`, `WORKSPACE`, or `WORKSPACE.bazel` are detected automatically and their Maven dependencies extracted as part of the standard scan-create flow.
10+
- **Bazel PyPI extraction**`socket manifest bazel --ecosystem pypi` now generates `requirements.txt` for Python Bazel workspaces. Discovers custom `rules_python` pip hub names with Bazel command output first, queries `py_library` / `py_binary` / `py_test` dependencies, resolves canonical pinned versions from `requirements_lock.txt`, and emits PEP 503-normalized `name==version` lines. Supports both Bzlmod (`pip.parse`) and legacy `WORKSPACE` (`pip_parse` / `pip_install`) configurations. PyPI remains explicit opt-in for `socket scan create --auto-manifest` until real-world no-lockfile recovery is validated.
11+
12+
### Changed
13+
- **Bazel diagnostics**`socket manifest bazel --verbose` now emits bounded subprocess traces with argv, cwd, duration, exit status, output sizes, and failure stderr tails to make customer log-only triage safer and faster.
1014

1115
## [1.1.98](https://github.com/SocketDev/socket-cli/releases/tag/v1.1.98) - 2026-05-22
1216

src/commands/manifest/README.md

Lines changed: 79 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,14 @@ manifest generator. Useful when you do not want to spell out the language.
1616

1717
## socket manifest bazel [beta]
1818

19-
Generates Bazel JVM SBOM manifests (`maven_install.json`-shaped) by running
20-
`bazel query` against discovered Maven repos in a Bazel workspace. Output is
21-
consumed by `socket scan create` and closes the
22-
inline-Maven-declaration gap that lockfile-only parsing misses.
19+
Generates Bazel SBOM manifests (Maven `maven_install.json` and/or PyPI
20+
`requirements.txt`) by running `bazel query` against discovered ecosystem
21+
hubs in a Bazel workspace. Output is consumed by `socket scan create` and
22+
closes the inline-declaration gap that lockfile-only parsing misses for
23+
Bazel monorepos.
2324

24-
> **Note**: This command generates Maven dependency manifests for Bazel JVM
25-
> workspaces. It does not run reachability analysis.
25+
> **Note**: This command generates dependency manifests for Bazel
26+
> workspaces (Maven and PyPI). It does not run reachability analysis.
2627
2728
### Usage
2829

@@ -36,33 +37,98 @@ socket manifest bazel [options] [DIR=.]
3637
- `--bazel-rc <path>` — path to additional `.bazelrc` fragments forwarded to bazel.
3738
- `--bazel-flags <str>` — flags forwarded to every bazel invocation (single quoted string).
3839
- `--bazel-output-base <dir>` — Bazel `--output_base` for read-only-cache CI environments.
40+
- `--ecosystem <name>` — ecosystem(s) to extract; repeatable. Supported values: `maven`, `pypi`. When omitted, Maven is generated by default; PyPI is explicit opt-in.
3941
- `--out <dir>` — output directory; default `./.socket/bazel-manifests/`.
4042
- `--dry-run`, `--verbose` — standard diagnostic flags.
4143

4244
> **Upload**: This subcommand only generates manifests. To generate and
4345
> upload in one step, use `socket scan create --auto-manifest .` — it
44-
> detects the workspace, runs the same extraction this subcommand performs,
45-
> and uploads the result.
46+
> detects the workspace, generates Bazel Maven manifests, and uploads the
47+
> result. Generate Bazel PyPI manifests explicitly with `socket manifest bazel
48+
> --ecosystem pypi`, then scan the generated output with `socket scan create`.
4649
4750
### Examples
4851

4952
```bash
50-
# Generate maven manifests from the current Bazel workspace.
53+
# Generate the default Bazel Maven manifest from the current workspace.
5154
socket manifest bazel .
5255

56+
# Generate only the PyPI manifest.
57+
socket manifest bazel . --ecosystem pypi
58+
59+
# Generate both Maven and PyPI manifests explicitly.
60+
socket manifest bazel . --ecosystem maven --ecosystem pypi
61+
5362
# Use bazelisk explicitly.
5463
socket manifest bazel --bazel=/usr/local/bin/bazelisk .
5564
```
5665

66+
### Python/PyPI Extraction
67+
68+
When `--ecosystem pypi` is selected, the command:
69+
70+
1. Discovers `rules_python` pip hubs from Bazel's `mod show_extension` output when available, with bounded static parsing of `MODULE.bazel` (`pip.parse(hub_name = "...")`) and legacy `WORKSPACE` (`pip_parse(name = "...")` / `pip_install(name = "...")`) retained as fallback. Hub names are never hardcoded; custom names like `my_pypi` are detected automatically.
71+
2. Validates each candidate hub by probing it with `bazel query` for `:pkg` targets / `alias(` rules. Invalid candidates are dropped.
72+
3. Runs `bazel query 'deps(kind("py_library|py_binary|py_test", //...))'` to determine which PyPI packages are actually reached by Python rules in the repo (test dependencies included for whole-repo scope).
73+
4. Reads `requirements_lock.txt` (the path discovered from `pip.parse(requirements_lock = "...")`) for canonical pinned versions. When the lockfile is unavailable, falls back to parsing `pypi_name=` and `pypi_version=` tags from the spoke `py_library` rules in the hub-and-spoke architecture.
74+
5. Emits a sorted canonical `requirements.txt` containing `name==version` lines for every reached package.
75+
76+
### PyPI Name and Version Semantics
77+
78+
- **PEP 503 normalization.** Package matching uses PEP 503 normalization
79+
(lowercase, then any run of `-`, `_`, or `.` is collapsed to a single
80+
`-`). Bazel target names use underscores (`charset_normalizer`); PyPI
81+
canonical names use hyphens (`charset-normalizer`). The emitted
82+
`requirements.txt` always uses the canonical hyphenated form.
83+
- **Lockfile pins win.** When the lockfile and spoke-repo tags disagree on
84+
a version, the lockfile wins because that is the version Bazel actually
85+
resolves at analysis time. A `--verbose` warning is logged for the
86+
divergence.
87+
- **Conflict detection.** When two reached packages normalize to the same
88+
PyPI name with different versions, the command fails clearly: a single
89+
`requirements.txt` cannot represent both versions, and silently
90+
picking one would produce a misleading SBOM.
91+
92+
### Unsupported PyPI Forms
93+
94+
The PyPI extractor is intentionally narrow in this phase:
95+
96+
- **Direct URL, editable (`-e`), and unpinned requirements** are not
97+
emitted. Only canonical `name==version` lines from the resolved
98+
lockfile are produced. Repositories that rely on unpinned or
99+
URL-pinned requirements will see those packages omitted from
100+
`requirements.txt`.
101+
- **Private corpus validation** requires authenticated GitHub access.
102+
When credentials are unavailable, the bazel-bench harness's private
103+
PyPI case skips cleanly with a distinct reason rather than failing.
104+
- **Whole-repo extraction.** The initial PyPI implementation emits one
105+
whole-workspace manifest. Per-target PyPI slicing is not currently
106+
supported.
107+
108+
### Cross-Language Edges
109+
110+
Bazel repos with cross-language dependencies (e.g. `rust_library`
111+
`py_library` via PyO3 / cffi / etc.) are **not** traversed by the PyPI
112+
extractor in this phase. The PyPI extractor only covers Python rule
113+
dependencies reachable from `py_library`, `py_binary`, and `py_test`
114+
targets. Cross-language edges are assigned to Phase 4. The bazel-bench
115+
fixture `constructed/python-pypi` includes Go/Rust sidecars as
116+
validation context only; they are intentionally not asserted by the
117+
PyPI correctness cases.
118+
57119
### Requirements
58120

59121
- `bazel` or `bazelisk` on `PATH` (or pass `--bazel <path>`).
60-
- Network access on cold cache. Bazel and `rules_jvm_external` own their own
61-
retry policy for transient Maven resolution failures — `socket manifest bazel`
62-
does not retry on top of them.
122+
- Network access on cold cache. Bazel and `rules_jvm_external` /
123+
`rules_python` own their own retry policy for transient resolution
124+
failures — `socket manifest bazel` does not retry on top of them.
63125
- Writable Bazel output base; pass `--bazel-output-base` for read-only-cache CI.
126+
- For PyPI extraction: a Python 3 interpreter on `PATH` so the
127+
rules_python toolchain can analyze the workspace.
64128

65-
This is the user-visible entry point for Bazel JVM SBOM support; the [beta] label and "Bazel JVM SBOM support" wording must stay consistent across release notes and docs.
129+
This is the user-visible entry point for Bazel SBOM support (Maven and
130+
PyPI); the [beta] label and "Bazel SBOM support" wording must stay
131+
consistent across release notes and docs.
66132

67133
## socket manifest cdxgen
68134

0 commit comments

Comments
 (0)