You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR makes Bazel manifest creation Python-aware.
This builds on the Maven Bazel work from [#1312](#1312), which closes an inline-declaration gap that exists in `rules_jvm_external`: Bazel can resolve Maven artifacts that do not exist in a checked-in Maven manifest. Python is different. `rules_python` commonly resolves packages from a checked-in pinned requirements or lock file and exposes those packages as Bazel labels.
It works like this: a Bazel Python rule points to a checked-in requirements file. Bazel reads that file and makes the declared packages available as dependencies in the configured pip hub. Future Bazel build targets can then directly declare dependencies on those Python packages.
What this PR does is emit a generated `requirements.txt` that contains only the pinned Python packages reachable from Bazel Python rules. It does not mutate or remove entries from the user's checked-in requirements file. The value is scoping the generated manifest to Bazel's reached package set instead of assuming every checked-in requirement is used by Bazel Python targets.
This functionality does not kick in automatically, since I'm not fully convinced it won't cause more harm than good or cause confusion. It has to be manually enabled with `socket manifest bazel --ecosystem pypi`. `socket scan create --auto-manifest` continues to generate Bazel Maven manifests only.
## Summary of changes
- add `socket manifest bazel --ecosystem pypi` support for whole-repo Bazel PyPI `requirements.txt` generation
- discover rules_python pip hubs via Bazel command output first, with bounded static fallback paths
- keep Bazel PyPI generation explicit; `socket scan create --auto-manifest` continues to generate Bazel Maven only
- add bounded verbose diagnostics for Bazel subprocess, discovery, extraction, and empty-result triage
- document the new command surface and add exact constructed-fixture oracle coverage
Copy file name to clipboardExpand all lines: CHANGELOG.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
7
7
## [Unreleased]
8
8
-**`socket manifest bazel [beta]`** — Generate Bazel JVM SBOM manifests by running `bazel query` against discovered Maven repos in a Bazel workspace. Closes the inline-Maven-declaration gap that lockfile-only parsing misses for repos like envoy, ray, tensorflow, tink-java, and or-tools. Auto-detects Bzlmod and legacy `WORKSPACE`.
9
9
-**`socket scan create --auto-manifest`** now covers Bazel workspaces in addition to Gradle/Scala/Kotlin/Conda. Repos with `MODULE.bazel`, `WORKSPACE`, or `WORKSPACE.bazel` are detected automatically and their Maven dependencies extracted as part of the standard scan-create flow.
10
+
-**Bazel PyPI extraction** — `socket manifest bazel --ecosystem pypi` now generates `requirements.txt` for Python Bazel workspaces. Discovers custom `rules_python` pip hub names with Bazel command output first, queries `py_library` / `py_binary` / `py_test` dependencies, resolves canonical pinned versions from `requirements_lock.txt`, and emits PEP 503-normalized `name==version` lines. Supports both Bzlmod (`pip.parse`) and legacy `WORKSPACE` (`pip_parse` / `pip_install`) configurations. PyPI remains explicit opt-in for `socket scan create --auto-manifest` until real-world no-lockfile recovery is validated.
11
+
12
+
### Changed
13
+
-**Bazel diagnostics** — `socket manifest bazel --verbose` now emits bounded subprocess traces with argv, cwd, duration, exit status, output sizes, and failure stderr tails to make customer log-only triage safer and faster.
-`--bazel-rc <path>` — path to additional `.bazelrc` fragments forwarded to bazel.
37
38
-`--bazel-flags <str>` — flags forwarded to every bazel invocation (single quoted string).
38
39
-`--bazel-output-base <dir>` — Bazel `--output_base` for read-only-cache CI environments.
40
+
-`--ecosystem <name>` — ecosystem(s) to extract; repeatable. Supported values: `maven`, `pypi`. When omitted, Maven is generated by default; PyPI is explicit opt-in.
1. Discovers `rules_python` pip hubs from Bazel's `mod show_extension` output when available, with bounded static parsing of `MODULE.bazel` (`pip.parse(hub_name = "...")`) and legacy `WORKSPACE` (`pip_parse(name = "...")` / `pip_install(name = "...")`) retained as fallback. Hub names are never hardcoded; custom names like `my_pypi` are detected automatically.
71
+
2. Validates each candidate hub by probing it with `bazel query` for `:pkg` targets / `alias(` rules. Invalid candidates are dropped.
72
+
3. Runs `bazel query 'deps(kind("py_library|py_binary|py_test", //...))'` to determine which PyPI packages are actually reached by Python rules in the repo (test dependencies included for whole-repo scope).
73
+
4. Reads `requirements_lock.txt` (the path discovered from `pip.parse(requirements_lock = "...")`) for canonical pinned versions. When the lockfile is unavailable, falls back to parsing `pypi_name=` and `pypi_version=` tags from the spoke `py_library` rules in the hub-and-spoke architecture.
74
+
5. Emits a sorted canonical `requirements.txt` containing `name==version` lines for every reached package.
75
+
76
+
### PyPI Name and Version Semantics
77
+
78
+
-**PEP 503 normalization.** Package matching uses PEP 503 normalization
79
+
(lowercase, then any run of `-`, `_`, or `.` is collapsed to a single
80
+
`-`). Bazel target names use underscores (`charset_normalizer`); PyPI
81
+
canonical names use hyphens (`charset-normalizer`). The emitted
82
+
`requirements.txt` always uses the canonical hyphenated form.
83
+
-**Lockfile pins win.** When the lockfile and spoke-repo tags disagree on
84
+
a version, the lockfile wins because that is the version Bazel actually
85
+
resolves at analysis time. A `--verbose` warning is logged for the
86
+
divergence.
87
+
-**Conflict detection.** When two reached packages normalize to the same
88
+
PyPI name with different versions, the command fails clearly: a single
89
+
`requirements.txt` cannot represent both versions, and silently
90
+
picking one would produce a misleading SBOM.
91
+
92
+
### Unsupported PyPI Forms
93
+
94
+
The PyPI extractor is intentionally narrow in this phase:
95
+
96
+
-**Direct URL, editable (`-e`), and unpinned requirements** are not
97
+
emitted. Only canonical `name==version` lines from the resolved
98
+
lockfile are produced. Repositories that rely on unpinned or
99
+
URL-pinned requirements will see those packages omitted from
100
+
`requirements.txt`.
101
+
-**Private corpus validation** requires authenticated GitHub access.
102
+
When credentials are unavailable, the bazel-bench harness's private
103
+
PyPI case skips cleanly with a distinct reason rather than failing.
104
+
-**Whole-repo extraction.** The initial PyPI implementation emits one
105
+
whole-workspace manifest. Per-target PyPI slicing is not currently
106
+
supported.
107
+
108
+
### Cross-Language Edges
109
+
110
+
Bazel repos with cross-language dependencies (e.g. `rust_library` →
111
+
`py_library` via PyO3 / cffi / etc.) are **not** traversed by the PyPI
112
+
extractor in this phase. The PyPI extractor only covers Python rule
113
+
dependencies reachable from `py_library`, `py_binary`, and `py_test`
114
+
targets. Cross-language edges are assigned to Phase 4. The bazel-bench
115
+
fixture `constructed/python-pypi` includes Go/Rust sidecars as
116
+
validation context only; they are intentionally not asserted by the
117
+
PyPI correctness cases.
118
+
57
119
### Requirements
58
120
59
121
-`bazel` or `bazelisk` on `PATH` (or pass `--bazel <path>`).
60
-
- Network access on cold cache. Bazel and `rules_jvm_external`own their own
- Network access on cold cache. Bazel and `rules_jvm_external`/
123
+
`rules_python` own their own retry policy for transient resolution
124
+
failures — `socket manifest bazel`does not retry on top of them.
63
125
- Writable Bazel output base; pass `--bazel-output-base` for read-only-cache CI.
126
+
- For PyPI extraction: a Python 3 interpreter on `PATH` so the
127
+
rules_python toolchain can analyze the workspace.
64
128
65
-
This is the user-visible entry point for Bazel JVM SBOM support; the [beta] label and "Bazel JVM SBOM support" wording must stay consistent across release notes and docs.
129
+
This is the user-visible entry point for Bazel SBOM support (Maven and
130
+
PyPI); the [beta] label and "Bazel SBOM support" wording must stay
0 commit comments