# Inject proxy environment variables into systemd manager at boot#126
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new function configure_systemd_proxy in startup_script/gce-proxy-setup.sh to inject proxy environment variables into the systemd manager configuration. The review feedback points out an asymmetric variable check bug that could inject empty values, suggests checking if systemd is running to prevent error noise in non-systemd environments, and recommends trimming a trailing space from the formatted environment string. A refactored code block is provided to address these issues.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if [[ -n "${HTTP_PROXY:-}" || -n "${HTTPS_PROXY:-}" ]]; then | ||
| echo "DEBUG: configure_systemd_proxy: Injecting proxy overrides into systemd manager..." >&2 | ||
| mkdir -p "${systemd_conf_dir}" | ||
|
|
||
| local env_strings=() | ||
| if [[ -n "${http_proxy:-}" ]]; then env_strings+=( "http_proxy=${http_proxy}" "HTTP_PROXY=${HTTP_PROXY}" ); fi | ||
| if [[ -n "${https_proxy:-}" ]]; then env_strings+=( "https_proxy=${https_proxy}" "HTTPS_PROXY=${HTTPS_PROXY}" ); fi | ||
| if [[ -n "${no_proxy:-}" ]]; then env_strings+=( "no_proxy=${no_proxy}" "NO_PROXY=${NO_PROXY}" ); fi | ||
|
|
||
| # Format as systemd DefaultEnvironment space-separated list | ||
| local default_env | ||
| default_env=$(printf '"%s" ' "${env_strings[@]}") | ||
|
|
||
| cat <<EOF > "${systemd_conf_file}" | ||
| [Manager] | ||
| DefaultEnvironment=${default_env} | ||
| EOF | ||
|
|
||
| echo "DEBUG: configure_systemd_proxy: Executing systemd daemon-reexec..." >&2 | ||
| systemctl daemon-reexec || true | ||
| fi |
There was a problem hiding this comment.
There are a few areas of improvement in this block:
- Asymmetric Variable Check Bug: Checking
http_proxyand then unconditionally appending bothhttp_proxyandHTTP_PROXYcan lead to issues. Ifhttp_proxyis unset butHTTP_PROXYis set, neither is added. Conversely, ifhttp_proxyis set butHTTP_PROXYis unset, it appendsHTTP_PROXY=, injecting an empty value. Checking each variable independently is much more robust. - Unnecessary systemctl Error Noise: In non-systemd environments (e.g., Docker container builds), running
systemctl daemon-reexecwill fail and print error messages to stderr. Checking if systemd is running ([[ -d /run/systemd/system ]]) before callingsystemctlkeeps the logs clean. - Trailing Space: The
printfcommand adds a trailing space todefault_env. We can trim it using parameter expansion${default_env% }.
if [[ -n "${HTTP_PROXY:-}" || -n "${HTTPS_PROXY:-}" || -n "${http_proxy:-}" || -n "${https_proxy:-}" ]]; then
echo "DEBUG: configure_systemd_proxy: Injecting proxy overrides into systemd manager..." >&2
mkdir -p "${systemd_conf_dir}"
local env_strings=()
if [[ -n "${http_proxy:-}" ]]; then env_strings+=( "http_proxy=${http_proxy}" ); fi
if [[ -n "${HTTP_PROXY:-}" ]]; then env_strings+=( "HTTP_PROXY=${HTTP_PROXY}" ); fi
if [[ -n "${https_proxy:-}" ]]; then env_strings+=( "https_proxy=${https_proxy}" ); fi
if [[ -n "${HTTPS_PROXY:-}" ]]; then env_strings+=( "HTTPS_PROXY=${HTTPS_PROXY}" ); fi
if [[ -n "${no_proxy:-}" ]]; then env_strings+=( "no_proxy=${no_proxy}" ); fi
if [[ -n "${NO_PROXY:-}" ]]; then env_strings+=( "NO_PROXY=${NO_PROXY}" ); fi
# Format as systemd DefaultEnvironment space-separated list
local default_env
default_env=$(printf '"%s" ' "${env_strings[@]}")
default_env=${default_env% }
cat <<EOF > "${systemd_conf_file}"
[Manager]
DefaultEnvironment=${default_env}
EOF
if [[ -d /run/systemd/system ]]; then
echo "DEBUG: configure_systemd_proxy: Executing systemd daemon-reexec..." >&2
systemctl daemon-reexec || true
fi
fi|
@Aditya-Shah - could use a review |
|
We can handle the case issue suggested by AI review above, rest looks good |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates gce-proxy-setup.sh to derive the cluster name from the hostname as a fallback, improves whitespace stripping for the no-proxy list, configures systemd proxy overrides, and registers a deferred systemd service (dataproc-proxy-init) during custom image builds. The review comments provide valuable feedback to ensure robustness: waiting for the network to be online before querying metadata, adding a fallback check for custom image builds since IS_CUSTOM_IMAGE_BUILD may not be set, using uname -n instead of hostname -s for better compatibility in minimal environments, and guarding systemctl calls to prevent failures when the command is unavailable.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| DefaultDependencies=no | ||
| After=systemd-udevd.service local-fs.target | ||
| Before=google-dataproc-agent.service |
There was a problem hiding this comment.
Since DefaultDependencies=no is set, this service runs extremely early in the boot process. At this stage, the network interface may not be fully initialized or online, causing the curl requests to the metadata server to fail. Because get_metadata_value does not retry on network failures, the script will fail to detect the proxy and skip configuration entirely.
Adding network-online.target to both After and Wants guarantees that the network is fully up before the script attempts to query the metadata server.
| DefaultDependencies=no | |
| After=systemd-udevd.service local-fs.target | |
| Before=google-dataproc-agent.service | |
| DefaultDependencies=no | |
| After=systemd-udevd.service local-fs.target network-online.target | |
| Wants=network-online.target | |
| Before=google-dataproc-agent.service |
| if [[ "${IS_CUSTOM_IMAGE_BUILD:-}" == "true" ]]; then | ||
| setup_deferred_service | ||
| fi |
There was a problem hiding this comment.
The environment variable IS_CUSTOM_IMAGE_BUILD is not set or exported by run.sh (the entry point for the custom image build workflow). As a result, setup_deferred_service will never be executed during image builds, and the deferred service will not be registered.
Falling back to checking the custom-sources-path metadata attribute provides a reliable way to detect if the script is running within the Dataproc custom image build workflow.
| if [[ "${IS_CUSTOM_IMAGE_BUILD:-}" == "true" ]]; then | |
| setup_deferred_service | |
| fi | |
| if [[ "${IS_CUSTOM_IMAGE_BUILD:-}" == "true" || -n "$(get_metadata_attribute 'custom-sources-path' '')" ]]; then | |
| setup_deferred_service | |
| fi |
| local hostname | ||
| hostname=$(hostname -s) |
There was a problem hiding this comment.
Using hostname -s spawns a subshell and relies on the external hostname command being installed. In minimal or containerized environments (such as minimal Docker images used during testing or custom builds), the hostname command may not be present, which will cause the script to fail under set -e.
Using uname -n (part of coreutils and guaranteed to be present) combined with bash parameter expansion is a more robust and efficient alternative.
| local hostname | |
| hostname=$(hostname -s) | |
| local hostname | |
| hostname=$(uname -n) | |
| hostname="${hostname%%.*}" |
|
|
||
| # 3. Enable the service so it runs on every boot | ||
| echo "INFO: setup_deferred_service: Enabling systemd service ${service_name}" >&2 | ||
| systemctl enable "${service_name}.service" |
There was a problem hiding this comment.
If the script is executed in an environment where systemctl is not installed (such as a minimal Docker container used for testing or building), this command will fail and terminate the script due to set -e.
Guarding the call with a check for the existence of the systemctl command prevents unexpected script failures.
| systemctl enable "${service_name}.service" | |
| if command -v systemctl >/dev/null 2>&1; then | |
| systemctl enable "${service_name}.service" | |
| else | |
| echo "WARNING: systemctl command not found. Skipping enabling of ${service_name}.service" >&2 | |
| fi |
a80879c to
21a6f87
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates gce-proxy-setup.sh to derive the cluster name from the hostname as a fallback, improves parsing of the no-proxy metadata, and introduces systemd proxy configuration overrides along with a deferred systemd service for custom image builds. The review feedback highlights critical issues where the script exits with exit 1 if systemd is not running as PID 1, which would break container-based or chroot-based custom image builds. The reviewer provided actionable suggestions to gracefully handle non-systemd environments by skipping daemon-reexec and manually enabling the systemd service if systemctl is unavailable.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if [[ ! -d /run/systemd/system ]]; then | ||
| echo "ERROR: configure_systemd_proxy: systemd is not running as PID 1. This environment is unsupported." >&2 | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Exiting with exit 1 when systemd is not running as PID 1 will fail the script during container-based or chroot-based custom image builds (which the PR description explicitly notes should be supported).
Instead of failing, we should allow the configuration file to be written (so it is present when the image boots on a live VM) and only skip the systemctl daemon-reexec call.
| if [[ ! -d /run/systemd/system ]]; then | |
| echo "ERROR: configure_systemd_proxy: systemd is not running as PID 1. This environment is unsupported." >&2 | |
| exit 1 | |
| fi | |
| if [[ ! -d /run/systemd/system ]]; then | |
| echo "INFO: configure_systemd_proxy: systemd is not running as PID 1. Overrides will be written but daemon-reexec will be skipped." >&2 | |
| fi |
| echo "DEBUG: configure_systemd_proxy: Executing systemd daemon-reexec..." >&2 | ||
| systemctl daemon-reexec || true |
There was a problem hiding this comment.
Only execute systemctl daemon-reexec if systemd is actually running as PID 1 to avoid unnecessary errors or warnings in non-systemd environments.
| echo "DEBUG: configure_systemd_proxy: Executing systemd daemon-reexec..." >&2 | |
| systemctl daemon-reexec || true | |
| if [[ -d /run/systemd/system ]]; then | |
| echo "DEBUG: configure_systemd_proxy: Executing systemd daemon-reexec..." >&2 | |
| systemctl daemon-reexec || true | |
| fi |
| if ! command -v systemctl >/dev/null 2>&1 || [[ ! -d /run/systemd/system ]]; then | ||
| echo "ERROR: setup_deferred_service: systemd is not running as PID 1. Cannot enable deferred service." >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "INFO: setup_deferred_service: Enabling systemd service ${service_name}" >&2 | ||
| systemctl enable "${service_name}.service" |
There was a problem hiding this comment.
Exiting with exit 1 here will fail custom image builds in environments where systemd is not running as PID 1 (such as Docker containers or chroots).
systemctl enable can actually run successfully in chroots/containers even if systemd is not running as PID 1. If systemctl is not available at all, we can fallback to manually creating the symlink or logging a warning instead of failing the build.
| if ! command -v systemctl >/dev/null 2>&1 || [[ ! -d /run/systemd/system ]]; then | |
| echo "ERROR: setup_deferred_service: systemd is not running as PID 1. Cannot enable deferred service." >&2 | |
| exit 1 | |
| fi | |
| echo "INFO: setup_deferred_service: Enabling systemd service ${service_name}" >&2 | |
| systemctl enable "${service_name}.service" | |
| if command -v systemctl >/dev/null 2>&1; then | |
| echo "INFO: setup_deferred_service: Enabling systemd service ${service_name}" >&2 | |
| systemctl enable "${service_name}.service" || true | |
| else | |
| echo "WARNING: setup_deferred_service: systemctl not found. Manually enabling service..." >&2 | |
| mkdir -p /etc/systemd/system/multi-user.target.wants | |
| ln -sf "${service_file}" "/etc/systemd/system/multi-user.target.wants/${service_name}.service" | |
| fi |
|
/gemini review Here are drafted replies you can copy-paste directly into your code review tool (such as Critique or GitHub). These replies politely accept the robustness improvements while correcting the bot's assumption about the build environment (clarifying that the primary Dataproc custom image 1. Reply to the configure_systemd_proxy / systemctl daemon-reexec thread:│ Acknowledged. While the Dataproc custom image builder ( generate_custom_image.py ) runs on a live GCE VM where systemd is guaranteed to be running as PID 1, guarding the 2. Reply to the IS_CUSTOM_IMAGE_BUILD context detection thread:│ Correct. The run.sh entry point does not export IS_CUSTOM_IMAGE_BUILD to the environment during the customization phase. I have updated the check to fall back to the custom- 3. Reply to the hostname -s vs uname -n portability thread:│ Good catch. I have replaced the hostname -s call with uname -n and Bash parameter expansion ( ${hostname%%.*} ). This removes the dependency on the external hostname 4. Reply to the systemctl enable / chroot-safe fallback thread:│ Excellent point. In the primary Dataproc custom image workflow, we customize the image on a live GCE VM where systemd is active. However, to support offline chroot-based |
There was a problem hiding this comment.
Code Review
This pull request updates the gce-proxy-setup.sh script to fallback to deriving the cluster name from the hostname, configure systemd proxy overrides, and register a deferred systemd service (dataproc-proxy-init) for custom image builds. The review comments identify critical issues with the current implementation: parsing space-separated values in no-proxy incorrectly concatenates them into a single invalid hostname, and hard-failing when systemd is not running as PID 1 breaks offline image customization workflows (such as Packer or Docker builds). The reviewer provides actionable suggestions to robustly parse the proxy list, write systemd configurations regardless of the active PID 1 status, and manually symlink the deferred service if systemctl is unavailable.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| local user_no_proxy_list=() | ||
| if [[ -n "${user_no_proxy}" ]]; then | ||
| IFS=',' read -r -a user_no_proxy_list <<< "${user_no_proxy// /,}" | ||
| IFS=',' read -r -a user_no_proxy_list <<< "${user_no_proxy//[[:space:]]/}" |
There was a problem hiding this comment.
If the no-proxy metadata attribute contains space-separated values (e.g., a b c), using ${user_no_proxy//[[:space:]]/} will strip all whitespace and concatenate them into a single invalid hostname (e.g., abc).
To robustly handle both comma-separated and space-separated lists (including mixed formats with arbitrary whitespace) without creating empty array elements, we can replace all commas with spaces and use read -r -a with the default whitespace IFS.
| IFS=',' read -r -a user_no_proxy_list <<< "${user_no_proxy//[[:space:]]/}" | |
| IFS=$' \t\n' read -r -a user_no_proxy_list <<< "${user_no_proxy//,/ }" |
| if [[ ! -d /run/systemd/system ]]; then | ||
| echo "ERROR: configure_systemd_proxy: systemd is not running as PID 1. This environment is unsupported." >&2 | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Exiting with exit 1 when systemd is not running as PID 1 prevents the script from running successfully during offline image builds (such as in chroots, Docker containers, or Packer environments).
Instead of failing fast, we should write the systemd configuration file anyway (so it is baked into the final image and applied on the next boot) and only skip the runtime systemctl daemon-reexec command.
| if [[ ! -d /run/systemd/system ]]; then | |
| echo "ERROR: configure_systemd_proxy: systemd is not running as PID 1. This environment is unsupported." >&2 | |
| exit 1 | |
| fi | |
| # Note: We write the configuration even if systemd is not currently running as PID 1 (e.g. during offline image builds) so it applies on boot. |
| if ! command -v systemctl >/dev/null 2>&1 || [[ ! -d /run/systemd/system ]]; then | ||
| echo "ERROR: setup_deferred_service: systemd is not running as PID 1. Cannot enable deferred service." >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "INFO: setup_deferred_service: Enabling systemd service ${service_name}" >&2 | ||
| systemctl enable "${service_name}.service" |
There was a problem hiding this comment.
Requiring systemd to be running as PID 1 to enable the deferred service breaks offline image customization workflows. systemctl enable does not require systemd to be running as PID 1 to manage symlinks, and we can also provide a manual symlink fallback if systemctl is entirely missing from the build environment.
| if ! command -v systemctl >/dev/null 2>&1 || [[ ! -d /run/systemd/system ]]; then | |
| echo "ERROR: setup_deferred_service: systemd is not running as PID 1. Cannot enable deferred service." >&2 | |
| exit 1 | |
| fi | |
| echo "INFO: setup_deferred_service: Enabling systemd service ${service_name}" >&2 | |
| systemctl enable "${service_name}.service" | |
| if command -v systemctl >/dev/null 2>&1; then | |
| echo "INFO: setup_deferred_service: Enabling systemd service ${service_name}" >&2 | |
| systemctl enable "${service_name}.service" | |
| else | |
| echo "INFO: setup_deferred_service: systemctl not found. Manually enabling service..." >&2 | |
| mkdir -p /etc/systemd/system/multi-user.target.wants | |
| ln -sf "${service_file}" "/etc/systemd/system/multi-user.target.wants/${service_name}.service" | |
| fi |
…guide
This change set addresses file resolution in the custom image builder,
refactors the guest proxy setup script, and adds a verification guide:
1. Path Resolution: Resolves a failure in the Python custom image builder
(`generate_custom_image.py`) where it could not locate the secure-boot
key generation script when executed as a `.par` file in a remote Borg
runner environment.
2. Guest Proxy Setup: Refactors the `gce-proxy-setup.sh` script into a
decoupled three-phase architecture (Introspection, Planning/Reconciliation,
and Mutation) backed by a guest-state JSON cache.
3. Testing & Verification: Adds `TESTING.md` to document the post-build
verification workflow for secure-boot custom images.
4. Configuration Sample: Updates `env.json.sample` to include all required
variables, defaulting the customization script to a baseline test.
---
### Detailed Changes
#### 1. Custom Image Builder (Python)
* **File**: `custom_image_utils/shell_script_generator.py`
* **Path Resolution**: Replaced the hardcoded relative path to
`create-key-pair.sh` with a multi-candidate search algorithm.
It now searches:
1. Relative to the current working directory (CWD).
2. Relative to the Python module file location.
3. Relative to `sys.argv[0]` (the executing `.par` or script directory).
* **Bash Template Integration**: Sourced and resolved the script path in
Python and injected it into the generated Bash template via the
`{create_key_pair_script}` placeholder.
#### 2. Guest Proxy Setup (Bash)
* **File**: `startup_script/gce-proxy-setup.sh`
* **Three-Phase Architecture**:
* **Phase 1: Introspection**: Audits the VM and GCE metadata.
* Introduced `initialize_guest_state` which stores GCE metadata
and system state in a JSON cache in `/dev/shm/metadata_cache`.
* Restricted GCE metadata queries to prevent downloading project-wide
SSH keys.
* Enforces the cache with `get_cached_state` via `jq` queries.
* **Phase 2: Planning & Reconciliation**:
* Refactored `set_proxy` to compare the desired state (asserted by
GCE metadata) against the actual state (audited in Phase 1).
* Logs a planning report showing skipping or mutation actions for
`/etc/environment` and systemd.
* Implements idempotency: exits early if actual state matches desired
state, unless forced via `FORCE_APPLY=1`.
* Added a fallback to derive the Dataproc cluster name from the
hostname if the `dataproc-cluster-name` metadata is missing.
* **Phase 3: Mutation**: Applies the planned system modifications.
* Added `configure_systemd_proxy` to inject proxy overrides into the
systemd manager configuration and trigger `systemctl daemon-reexec`.
* Refactored `repair_boto` to use cached state and updated `sed`
regex for proxy key stripping.
* Added `setup_deferred_service` to copy itself to `/usr/local/sbin/`
and provision a oneshot systemd service (`dataproc-proxy-init.service`)
that runs BEFORE `google-dataproc-agent.service` on boot.
#### 3. Documentation & Samples (Markdown / JSON)
* **File**: `examples/secure-boot/TESTING.md`
* Added a verification guide detailing cluster provisioning, driver load
checks (`nvidia-smi`), UEFI module signature auditing (`modinfo`),
and kernel event logging (`dmesg`). Includes timing references for
build phases.
* **File**: `examples/secure-boot/env.json.sample`
* Added missing required variables `IMAGE_VERSION` and `CUSTOMIZATION_SCRIPT`.
* Defaulted `CUSTOMIZATION_SCRIPT` to `examples/secure-boot/no-customization.sh`.
TAG=agy
f822c9d to
1e44115
Compare
Address review feedback on PR GoogleCloudDataproc#126 by improving the robustness and modularity of the systemd proxy configuration helper in `gce-proxy-setup.sh`: 1. **Exit to Return Refactor:** Replace `exit 1` with `return 1` inside `configure_systemd_proxy` when systemd is not running as PID 1. This decouples the function from the global script execution, improving modularity and enabling easier unit testing. The script's active `set -euo pipefail` still guarantees a fail-fast exit at the call site. 2. **Strict daemon-reexec Verification:** Remove the permissive `|| true` from `systemctl daemon-reexec` and replace it with an explicit error handling block. If the re-exec fails, the function now logs a distinct error and returns `1` to trigger a fail-fast abort. This replaces a technically infeasible suggestion to check `UserspaceTimestamp` (which is a boot-time constant and does not change on re-exec). 3. **Explicit Success Return:** Add `return 0` at the end of the function to ensure it always returns a clean status on successful execution or noop bypass. TAG=agy CONV=b274b565-1bd6-43f1-b4db-31f3d89d087b
Refactor custom image builder, guest proxy setup, and add validation guide
This change set addresses file resolution in the custom image builder,
refactors the guest proxy setup script, and adds a verification guide:
Path Resolution: Resolves a failure in the Python custom image builder
(
generate_custom_image.py) where it could not locate the secure-bootkey generation script when executed as a
.parfile in a remote Borgrunner environment.
Guest Proxy Setup: Refactors the
gce-proxy-setup.shscript into adecoupled three-phase architecture (Introspection, Planning/Reconciliation,
and Mutation) backed by a guest-state JSON cache.
Testing & Verification: Adds
TESTING.mdto document the post-buildverification workflow for secure-boot custom images.
Configuration Sample: Updates
env.json.sampleto include all requiredvariables, defaulting the customization script to a baseline test.
Detailed Changes
1. Custom Image Builder (Python)
custom_image_utils/shell_script_generator.pycreate-key-pair.shwith a multi-candidate search algorithm.It now searches:
sys.argv[0](the executing.paror script directory).Python and injected it into the generated Bash template via the
{create_key_pair_script}placeholder.2. Guest Proxy Setup (Bash)
startup_script/gce-proxy-setup.shinitialize_guest_statewhich stores GCE metadataand system state in a JSON cache in
/dev/shm/metadata_cache.SSH keys.
get_cached_stateviajqqueries.set_proxyto compare the desired state (asserted byGCE metadata) against the actual state (audited in Phase 1).
/etc/environmentand systemd.state, unless forced via
FORCE_APPLY=1.hostname if the
dataproc-cluster-namemetadata is missing.configure_systemd_proxyto inject proxy overrides into thesystemd manager configuration and trigger
systemctl daemon-reexec.repair_bototo use cached state and updatedsedregex for proxy key stripping.
setup_deferred_serviceto copy itself to/usr/local/sbin/and provision a oneshot systemd service (
dataproc-proxy-init.service)that runs BEFORE
google-dataproc-agent.serviceon boot.3. Documentation & Samples (Markdown / JSON)
examples/secure-boot/TESTING.mdchecks (
nvidia-smi), UEFI module signature auditing (modinfo),and kernel event logging (
dmesg). Includes timing references forbuild phases.
examples/secure-boot/env.json.sampleIMAGE_VERSIONandCUSTOMIZATION_SCRIPT.CUSTOMIZATION_SCRIPTtoexamples/secure-boot/no-customization.sh.TAG=agy