Examples: Support Arm builds, interactive debugging, and proxy configuration#127
Conversation
Update the Secure Boot custom image builder to support Arm architectures, add an interactive debugging workflow, configure Secure Web Proxy (SWP), update path resolution, and sanitize the public .gitignore. ### 1. Arm Architecture Support * **Targets:** Added `2.2-ubuntu22-arm` and `2.3-ubuntu22-arm` image chains. * **Machine Type:** Added `create_arm_instance` to use `t2a-standard-2` GCE instances for Arm builds. * **Build Paths:** Bypass GPU/ML stages (TensorFlow, PyTorch, RAPIDS, Spark) for Arm architectures. * **Base Image:** Fall back to `secure-proxy` instead of `proxy-tf` for optional components (Docker, Jupyter, Zeppelin, Pig) on Arm. ### 2. Interactive Debugging Workflow * **Documentation:** Documented the `customize-in-screen.sh` workflow for running customization on persistent debug VMs. * **Disk Monitoring:** Install `screen` on the VM before running the disk usage monitor in `no-customization.sh`. * **State Resolution:** Resolve the build directory on failure using `sort | tail -n1` to handle multiple directories. ### 3. Proxy & Network Integration * **SWP Variables:** Pass `SWP_IP`, `SWP_PORT`, and `PROXY_CERT_GCS_PATH` through the Podman runner to the builder VM. * **Environment:** Parse and export SWP variables from `env.json`. * **Integration:** Configure the builder and debug VM to use the local `startup_script/gce-proxy-setup.sh` as the authoritative proxy configuration script. ### 4. Code Robustness & Sanitization * **Path Resolution:** Traverse parent directories in `env.sh` to resolve `DATAPROC_EVOLUTION_DIR` across nested git boundaries. * **Error Handling:** Update `run_gcloud` to capture and return exit codes under `set -e`. * **Globbing:** Prevent errors in `build-current-images.sh` when no logs match the timestamp. * **Repository Sanitization:** Sanitize `.gitignore` to remove development scratchpads, local draft scripts, and specific file patterns, replacing them with generic rules for keys, logs, and temporary directories. TAG=agy CONV=b274b565-1bd6-43f1-b4db-31f3d89d087b
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive suite of scripts and documentation for manual customization, interactive debugging, and diagnostics of Dataproc secure boot images, including support for ARM-based instances and updated CUDA versions. The review feedback highlights several critical areas for improvement: resolving a variable expansion bug in cleanup-builders.sh, removing hardcoded debug overrides in ssh-builder, avoiding redundant user prompts, and addressing security concerns with predictable temporary filenames. Additionally, portability issues on macOS (such as stat and readlink usage) should be resolved, and script robustness should be enhanced by adding error handling to remote SSH blocks, improving screen PID retrieval, and using proper argument propagation ("$@").
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| SCREEN_CMD=" | ||
| attempt=0 | ||
| while true; do | ||
| gcloud beta compute instances delete ${instance} --zone=${ZONE} --project=${PROJECT_ID} --quiet --no-graceful-shutdown && break | ||
| attempt=$((attempt + 1)) | ||
| if [[ ${attempt} -ge 3 ]]; then | ||
| echo 'Max retries reached for ${instance}' | ||
| break | ||
| fi | ||
| echo 'Failed, retrying in 5 seconds...' | ||
| read -t 5 | ||
| done | ||
| echo \"Delete command for ${instance} finished with exit code $?\" | ||
| echo \"Window will close in 5 seconds...\" | ||
| read -t 5 | ||
| " |
There was a problem hiding this comment.
Because SCREEN_CMD is defined using double quotes, variables like attempt and $? are eagerly expanded on the host workstation rather than inside the screen session. Since attempt is unset on the host, attempt=$((attempt + 1)) expands to attempt=$(( + 1)), which is a syntax error. Additionally, $? will report the exit code of the host's last command instead of the remote gcloud command. Escaping the $ character for these variables defers their expansion to execution time.
| SCREEN_CMD=" | |
| attempt=0 | |
| while true; do | |
| gcloud beta compute instances delete ${instance} --zone=${ZONE} --project=${PROJECT_ID} --quiet --no-graceful-shutdown && break | |
| attempt=$((attempt + 1)) | |
| if [[ ${attempt} -ge 3 ]]; then | |
| echo 'Max retries reached for ${instance}' | |
| break | |
| fi | |
| echo 'Failed, retrying in 5 seconds...' | |
| read -t 5 | |
| done | |
| echo \"Delete command for ${instance} finished with exit code $?\" | |
| echo \"Window will close in 5 seconds...\" | |
| read -t 5 | |
| " | |
| SCREEN_CMD=" | |
| attempt=0 | |
| while true; do | |
| gcloud beta compute instances delete ${instance} --zone=${ZONE} --project=${PROJECT_ID} --quiet --no-graceful-shutdown && break | |
| attempt=\$((attempt + 1)) | |
| if [[ \${attempt} -ge 3 ]]; then | |
| echo 'Max retries reached for ${instance}' | |
| break | |
| fi | |
| echo 'Failed, retrying in 5 seconds...' | |
| read -t 5 | |
| done | |
| echo \"Delete command for ${instance} finished with exit code \$?\" | |
| echo \"Window will close in 5 seconds...\" | |
| read -t 5 | |
| " |
| if [[ -z "${INSTANCE_NAME}" ]]; then | ||
| report_result "Not Found" | ||
| echo "No RUNNING builder instance found matching the prefix." >&2 | ||
| # exit 1 | ||
| fi | ||
|
|
||
| report_result "Found: ${INSTANCE_NAME}" | ||
|
|
||
| print_status "SSHing into ${INSTANCE_NAME}"... | ||
|
|
||
| INSTANCE_NAME=debug-deb12-build | ||
|
|
||
| gcloud compute ssh "${INSTANCE_NAME}" --zone "${ZONE}" --project "${PROJECT_ID}" |
There was a problem hiding this comment.
This script contains leftover debug code that overrides the dynamically discovered INSTANCE_NAME with a hardcoded value (debug-deb12-build). Additionally, the exit 1 statement is commented out, which allows the script to proceed even if no running builder instance is found. Uncommenting the exit statement and removing the hardcoded override restores the script's intended dynamic behavior.
| if [[ -z "${INSTANCE_NAME}" ]]; then | |
| report_result "Not Found" | |
| echo "No RUNNING builder instance found matching the prefix." >&2 | |
| # exit 1 | |
| fi | |
| report_result "Found: ${INSTANCE_NAME}" | |
| print_status "SSHing into ${INSTANCE_NAME}"... | |
| INSTANCE_NAME=debug-deb12-build | |
| gcloud compute ssh "${INSTANCE_NAME}" --zone "${ZONE}" --project "${PROJECT_ID}" | |
| if [[ -z "${INSTANCE_NAME}" ]]; then | |
| report_result "Not Found" | |
| echo "No RUNNING builder instance found matching the prefix." >&2 | |
| exit 1 | |
| fi | |
| report_result "Found: ${INSTANCE_NAME}" | |
| print_status "SSHing into ${INSTANCE_NAME}"... | |
| gcloud compute ssh "${INSTANCE_NAME}" --zone "${ZONE}" --project "${PROJECT_ID}" |
| TEMP_SCREENRC="${tmpdir}/temp_cleanup.screenrc" | ||
| rm -f "${TEMP_SCREENRC}" | ||
| touch "${TEMP_SCREENRC}" | ||
|
|
||
| echo "The following builder VMs will be deleted in a new screen session:" >&2 | ||
| echo "${INSTANCE_LIST}" >&2 | ||
|
|
||
| read -p "Continue with deletion? (y/N): " confirm | ||
| if [[ "${confirm}" != [yY] ]]; then | ||
| echo "Deletion cancelled." >&2 | ||
| exit 0 | ||
| fi |
There was a problem hiding this comment.
The script prompts the user for confirmation twice with the exact same message. Removing the second redundant confirmation prompt improves usability and avoids confusion.
| TEMP_SCREENRC="${tmpdir}/temp_cleanup.screenrc" | |
| rm -f "${TEMP_SCREENRC}" | |
| touch "${TEMP_SCREENRC}" | |
| echo "The following builder VMs will be deleted in a new screen session:" >&2 | |
| echo "${INSTANCE_LIST}" >&2 | |
| read -p "Continue with deletion? (y/N): " confirm | |
| if [[ "${confirm}" != [yY] ]]; then | |
| echo "Deletion cancelled." >&2 | |
| exit 0 | |
| fi | |
| TEMP_SCREENRC="${tmpdir}/temp_cleanup.screenrc" | |
| rm -f "${TEMP_SCREENRC}" | |
| touch "${TEMP_SCREENRC}" |
| GSA="${SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" | ||
|
|
||
| GCS_SOURCES_PATH="gs://${BUCKET}/${INSTANCE_NAME}/sources" | ||
| CACHE_FILE="/tmp/latest_dataproc_$(echo "${IMAGE_VERSION}" | tr './-' '_').txt" |
There was a problem hiding this comment.
Using a predictable filename directly in /tmp can lead to permission conflicts or symlink attacks on shared multi-user systems. Incorporating the current user's name into the filename prevents conflicts when multiple developers run the script on the same workstation or bastion host.
| CACHE_FILE="/tmp/latest_dataproc_$(echo "${IMAGE_VERSION}" | tr './-' '_').txt" | |
| CACHE_FILE="/tmp/latest_dataproc_${USER}_$(echo "${IMAGE_VERSION}" | tr './-' '_').txt" |
| # Determine Dataproc Image | ||
| if [[ -n "${DATAPROC_IMAGE:-}" ]]; then | ||
| echo "Using image from ENV: ${DATAPROC_IMAGE}" | ||
| elif [[ -f "${CACHE_FILE}" ]] && [[ $(($(date +%s) - $(stat -c %Y "${CACHE_FILE}"))) -lt "${CACHE_TTL_SECONDS}" ]]; then |
There was a problem hiding this comment.
The stat -c %Y command is specific to GNU stat and is not portable to macOS workstations, where it will fail with a syntax error. Using date -r is highly portable and works seamlessly on both macOS and Linux.
| elif [[ -f "${CACHE_FILE}" ]] && [[ $(($(date +%s) - $(stat -c %Y "${CACHE_FILE}"))) -lt "${CACHE_TTL_SECONDS}" ]]; then | |
| elif [[ -f "${CACHE_FILE}" ]] && [[ $(($(date +%s) - $(date -r "${CACHE_FILE}" +%s))) -lt "${CACHE_TTL_SECONDS}" ]]; then |
| ssh -o ControlMaster=no -o BatchMode=yes -o ConnectTimeout=10 "${INSTANCE_NAME}" " | ||
| GCS_PATH=\$(curl -s -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/attributes/custom-sources-path) | ||
| rm -rf /tmp/sources | ||
| mkdir -p /tmp/sources | ||
| gsutil -m cp -r \"\${GCS_PATH}/*\" /tmp/sources/ >/dev/null | ||
| chmod +x /tmp/sources/*.sh | ||
| # Spawn the guest screen bootstrap wrapper in the background, passing FORCE_APPLY status | ||
| nohup env FORCE_APPLY=${FORCE_APPLY} bash /tmp/sources/install-in-screen.sh /tmp/sources/init_actions.sh >/tmp/launcher.log 2>&1 & | ||
| sleep 1 | ||
| " |
There was a problem hiding this comment.
The remote command block executed via SSH does not run with set -e enabled. If gsutil or curl fails, the script will silently continue and attempt to run install-in-screen.sh with missing or corrupted files. Adding set -euo pipefail ensures the remote execution fails fast on any error.
| ssh -o ControlMaster=no -o BatchMode=yes -o ConnectTimeout=10 "${INSTANCE_NAME}" " | |
| GCS_PATH=\$(curl -s -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/attributes/custom-sources-path) | |
| rm -rf /tmp/sources | |
| mkdir -p /tmp/sources | |
| gsutil -m cp -r \"\${GCS_PATH}/*\" /tmp/sources/ >/dev/null | |
| chmod +x /tmp/sources/*.sh | |
| # Spawn the guest screen bootstrap wrapper in the background, passing FORCE_APPLY status | |
| nohup env FORCE_APPLY=${FORCE_APPLY} bash /tmp/sources/install-in-screen.sh /tmp/sources/init_actions.sh >/tmp/launcher.log 2>&1 & | |
| sleep 1 | |
| " | |
| ssh -o ControlMaster=no -o BatchMode=yes -o ConnectTimeout=10 "${INSTANCE_NAME}" " | |
| set -euo pipefail | |
| GCS_PATH=\$(curl -s -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/attributes/custom-sources-path) | |
| rm -rf /tmp/sources | |
| mkdir -p /tmp/sources | |
| gsutil -m cp -r \"\${GCS_PATH}/*\" /tmp/sources/ >/dev/null | |
| chmod +x /tmp/sources/*.sh | |
| # Spawn the guest screen bootstrap wrapper in the background, passing FORCE_APPLY status | |
| nohup env FORCE_APPLY=${FORCE_APPLY} bash /tmp/sources/install-in-screen.sh /tmp/sources/init_actions.sh >/tmp/launcher.log 2>&1 & | |
| sleep 1 | |
| " |
| # --- Wait and Stream Logs --- | ||
| # We find the PID of the newly spawned screen session | ||
| sleep 2 | ||
| SCREEN_PID=$(screen -ls | grep customization | awk '{print $1}' | cut -d. -f1 || echo "") |
There was a problem hiding this comment.
If multiple screen sessions contain the word 'customization' (e.g., from other users or dead sessions), grep customization will return multiple lines, causing SCREEN_PID to contain multiple PIDs separated by newlines. This will cause the subsequent kill -0 loop to fail with a syntax error. Grepping for the exact .customization suffix and limiting the output to a single line using head -n 1 makes the PID retrieval robust.
| SCREEN_PID=$(screen -ls | grep customization | awk '{print $1}' | cut -d. -f1 || echo "") | |
| SCREEN_PID=$(screen -ls | grep "\.customization" | head -n 1 | awk '{print $1}' | cut -d. -f1 || echo "") |
| export DATAPROC_EVOLUTION_DIR="$(realpath "${SCRIPT_DIR}/../../../..")" | ||
| # Traverse upwards from env.sh's directory to resolve the repository root | ||
| # containing both custom-images and cloud-dataproc, bypassing nested git boundaries. | ||
| current_dir=$(dirname "$(readlink -f "${BASH_SOURCE[0]}")") |
There was a problem hiding this comment.
The readlink -f command is not portable to macOS workstations, where it will fail with an error. Using a portable subshell with cd and pwd safely resolves the absolute path of the script's directory on both macOS and Linux.
| current_dir=$(dirname "$(readlink -f "${BASH_SOURCE[0]}")") | |
| current_dir=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) |
| function create_arm_instance() { | ||
| python3 generate_custom_image.py \ | ||
| --machine-type "t2a-standard-2" \ | ||
| $* | ||
| } |
There was a problem hiding this comment.
Using $* inside the function splits arguments on spaces, which can break arguments containing spaces or special characters. Using "$@" is the standard best practice to preserve argument boundaries correctly.
| function create_arm_instance() { | |
| python3 generate_custom_image.py \ | |
| --machine-type "t2a-standard-2" \ | |
| $* | |
| } | |
| function create_arm_instance() { | |
| python3 generate_custom_image.py \\ | |
| --machine-type "t2a-standard-2" \\ | |
| "$@" | |
| } |
Update the Secure Boot custom image builder to support Arm architectures, add an interactive debugging workflow, configure Secure Web Proxy (SWP), update path resolution, and sanitize the public .gitignore. ### 1. Arm Architecture Support * **Targets:** Added `2.2-ubuntu22-arm` and `2.3-ubuntu22-arm` image chains. * **Machine Type:** Added `create_arm_instance` to use `t2a-standard-2` GCE instances for Arm builds. * **Build Paths:** Bypass GPU/ML stages (TensorFlow, PyTorch, RAPIDS, Spark) for Arm architectures. * **Base Image:** Fall back to `secure-proxy` instead of `proxy-tf` for optional components (Docker, Jupyter, Zeppelin, Pig) on Arm. * **Prerelease Support:** Added temporary prerelease base image URI for `2.3-ubuntu22-arm` to enable testing. ### 2. Interactive Debugging Workflow * **Documentation:** Documented the `customize-in-screen.sh` workflow for running customization on persistent debug VMs. * **Orchestration:** Added `customize-in-screen.sh` to manage the idempotent VM lifecycle and screen attachment. * **Guest Bootstrap:** Added `install-in-screen.sh` to run customization inside a detached screen session on the VM. * **Diagnostics:** Added `audit-image-customizer.sh` to perform remote network and proxy connectivity audits. * **Helpers:** Added `create-debug-vm.sh`, `destroy-debug-vm.sh`, `ssh-debug-vm.sh`, and `scp-debug-vm.sh` to manage the debug VM. * **Cleanup:** Added `cleanup-builders.sh` to clean up lingering builder VMs in parallel using screen. ### 3. Proxy & Network Integration * **SWP Variables:** Pass `SWP_IP`, `SWP_PORT`, and `PROXY_CERT_GCS_PATH` through the Podman runner to the builder VM. * **Environment:** Parse and export SWP variables from `env.json`. * **Integration:** Configure the builder and debug VM to use the local `gce-proxy-setup.sh` as the authoritative proxy configuration script. ### 4. Code Robustness & Sanitization * **Path Resolution:** Traverse parent directories in `env.sh` to resolve `DATAPROC_EVOLUTION_DIR` across nested git boundaries. * **Error Handling:** Update `run_gcloud` to capture and return exit codes under `set -e`. * **Globbing:** Prevent errors in `build-current-images.sh` when no logs match the timestamp. * **Repository Sanitization:** Sanitize `.gitignore` to remove development scratchpads, local draft scripts, and specific file patterns, replacing them with generic rules for keys, logs, and temporary directories. TAG=agy CONV=b274b565-1bd6-43f1-b4db-31f3d89d087b
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces an interactive debugging and manual customization workflow for GCE image customization, including scripts to create, SSH into, audit, and destroy debug VMs, alongside updated documentation and ARM support. The review feedback highlights several critical issues: leftover debugging code in ssh-builder that hardcodes the instance name, incorrect variable evaluation in cleanup-builders.sh due to unescaped variables in double-quoted strings, a bootstrap issue in install-in-screen.sh where screen is used before installation, redundant confirmation prompts in cleanup-builders.sh, and a portability issue with stat on macOS in create-debug-vm.sh.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| --sort-by=\"~creationTimestamp\" \ | ||
| | head -n 1) | ||
|
|
||
| if [[ -z "${INSTANCE_NAME}" ]]; then | ||
| report_result "Not Found" | ||
| echo "No RUNNING builder instance found matching the prefix." >&2 | ||
| # exit 1 | ||
| fi | ||
|
|
||
| report_result "Found: ${INSTANCE_NAME}" | ||
|
|
||
| print_status "SSHing into ${INSTANCE_NAME}"... | ||
|
|
||
| INSTANCE_NAME=debug-deb12-build | ||
|
|
||
| gcloud compute ssh "${INSTANCE_NAME}" --zone "${ZONE}" --project "${PROJECT_ID}" |
There was a problem hiding this comment.
The script contains leftover debugging code that hardcodes INSTANCE_NAME=debug-deb12-build on line 32, completely overriding the dynamically discovered builder instance. Additionally, the exit 1 on line 25 is commented out, which would cause the script to proceed even if no running builder instance is found. Finally, the --sort-by argument on line 19 has unnecessarily escaped double quotes (\"~creationTimestamp\"), which can cause parsing issues in gcloud.
| --sort-by=\"~creationTimestamp\" \ | |
| | head -n 1) | |
| if [[ -z "${INSTANCE_NAME}" ]]; then | |
| report_result "Not Found" | |
| echo "No RUNNING builder instance found matching the prefix." >&2 | |
| # exit 1 | |
| fi | |
| report_result "Found: ${INSTANCE_NAME}" | |
| print_status "SSHing into ${INSTANCE_NAME}"... | |
| INSTANCE_NAME=debug-deb12-build | |
| gcloud compute ssh "${INSTANCE_NAME}" --zone "${ZONE}" --project "${PROJECT_ID}" | |
| --sort-by="~creationTimestamp" \ | |
| | head -n 1) | |
| if [[ -z "${INSTANCE_NAME}" ]]; then | |
| report_result "Not Found" | |
| echo "No RUNNING builder instance found matching the prefix." >&2 | |
| exit 1 | |
| fi | |
| report_result "Found: ${INSTANCE_NAME}" | |
| print_status "SSHing into ${INSTANCE_NAME}"... | |
| gcloud compute ssh "${INSTANCE_NAME}" --zone "${ZONE}" --project "${PROJECT_ID}" |
| SCREEN_CMD=" | ||
| attempt=0 | ||
| while true; do | ||
| gcloud beta compute instances delete ${instance} --zone=${ZONE} --project=${PROJECT_ID} --quiet --no-graceful-shutdown && break | ||
| attempt=$((attempt + 1)) | ||
| if [[ ${attempt} -ge 3 ]]; then | ||
| echo 'Max retries reached for ${instance}' | ||
| break | ||
| fi | ||
| echo 'Failed, retrying in 5 seconds...' | ||
| read -t 5 | ||
| done | ||
| echo \"Delete command for ${instance} finished with exit code $?\" | ||
| echo \"Window will close in 5 seconds...\" | ||
| read -t 5 | ||
| " |
There was a problem hiding this comment.
The SCREEN_CMD string is defined using double quotes, which causes the host shell to evaluate several variables and expressions before writing them to TEMP_SCREENRC. Specifically:
$((attempt + 1))on line 71 evaluates to1statically on the host, meaningattemptwill never increment dynamically inside the screen session.${attempt}on line 72 expands to an empty string on the host, resulting in a syntax error ([[ -ge 3 ]]) inside the screen session.$?on line 79 expands to the exit code of the last command run on the host before the loop, rather than the exit code of the command inside the screen session.
To fix this, escape these variables/expressions with a backslash so they are evaluated dynamically inside the screen session.
| SCREEN_CMD=" | |
| attempt=0 | |
| while true; do | |
| gcloud beta compute instances delete ${instance} --zone=${ZONE} --project=${PROJECT_ID} --quiet --no-graceful-shutdown && break | |
| attempt=$((attempt + 1)) | |
| if [[ ${attempt} -ge 3 ]]; then | |
| echo 'Max retries reached for ${instance}' | |
| break | |
| fi | |
| echo 'Failed, retrying in 5 seconds...' | |
| read -t 5 | |
| done | |
| echo \"Delete command for ${instance} finished with exit code $?\" | |
| echo \"Window will close in 5 seconds...\" | |
| read -t 5 | |
| " | |
| SCREEN_CMD=" | |
| attempt=0 | |
| while true; do | |
| gcloud beta compute instances delete ${instance} --zone=${ZONE} --project=${PROJECT_ID} --quiet --no-graceful-shutdown && break | |
| attempt=\$((attempt + 1)) | |
| if [[ \${attempt} -ge 3 ]]; then | |
| echo 'Max retries reached for ${instance}' | |
| break | |
| fi | |
| echo 'Failed, retrying in 5 seconds...' | |
| read -t 5 | |
| done | |
| echo \"Delete command for ${instance} finished with exit code \$?\" | |
| echo \"Window will close in 5 seconds...\" | |
| read -t 5 | |
| " |
|
|
||
| echo "INFO: Preparing to run customization: ${TARGET_SCRIPT}" >&2 | ||
|
|
There was a problem hiding this comment.
There is a chicken-and-egg bootstrap issue with screen. The script install-in-screen.sh attempts to launch the customization script inside a detached screen session, but screen is only installed inside the customization script (no-customization.sh). If screen is not pre-installed on the base image, install-in-screen.sh will fail immediately, and the customization script will never run to install it. We should ensure screen is installed on the host VM before attempting to spawn the screen session.
| echo "INFO: Preparing to run customization: ${TARGET_SCRIPT}" >&2 | |
| if ! command -v screen >/dev/null 2>&1; then | |
| echo "INFO: Installing screen..." >&2 | |
| if command -v dnf >/dev/null 2>&1; then | |
| sudo dnf -y -q install epel-release && sudo dnf -y -q install screen | |
| elif command -v apt-get >/dev/null 2>&1; then | |
| sudo apt-get update -y -qq >/dev/null 2>&1 | |
| sudo apt-get install -y -qq screen >/dev/null 2>&1 | |
| fi | |
| fi | |
| echo "INFO: Preparing to run customization: ${TARGET_SCRIPT}" >&2 |
| report_result "Found" | ||
| echo "The following builder VMs will be deleted in detached screen sessions:" >&2 | ||
| echo "${INSTANCE_LIST}" >&2 | ||
|
|
||
| read -p "Continue with deletion? (y/N): " confirm | ||
| if [[ "${confirm}" != [yY] ]]; then | ||
| echo "Deletion cancelled." >&2 | ||
| exit 0 | ||
| fi | ||
|
|
||
| TEMP_SCREENRC="${tmpdir}/temp_cleanup.screenrc" | ||
| rm -f "${TEMP_SCREENRC}" | ||
| touch "${TEMP_SCREENRC}" | ||
|
|
||
| echo "The following builder VMs will be deleted in a new screen session:" >&2 | ||
| echo "${INSTANCE_LIST}" >&2 | ||
|
|
||
| read -p "Continue with deletion? (y/N): " confirm | ||
| if [[ "${confirm}" != [yY] ]]; then | ||
| echo "Deletion cancelled." >&2 | ||
| exit 0 | ||
| fi |
There was a problem hiding this comment.
There are two redundant, identical confirmation prompts back-to-back (lines 45-49 and lines 58-62). The user is prompted twice to confirm the deletion of the same builder VMs. We should remove the first prompt and proceed directly to setting up the screen session and prompting once.
| report_result "Found" | |
| echo "The following builder VMs will be deleted in detached screen sessions:" >&2 | |
| echo "${INSTANCE_LIST}" >&2 | |
| read -p "Continue with deletion? (y/N): " confirm | |
| if [[ "${confirm}" != [yY] ]]; then | |
| echo "Deletion cancelled." >&2 | |
| exit 0 | |
| fi | |
| TEMP_SCREENRC="${tmpdir}/temp_cleanup.screenrc" | |
| rm -f "${TEMP_SCREENRC}" | |
| touch "${TEMP_SCREENRC}" | |
| echo "The following builder VMs will be deleted in a new screen session:" >&2 | |
| echo "${INSTANCE_LIST}" >&2 | |
| read -p "Continue with deletion? (y/N): " confirm | |
| if [[ "${confirm}" != [yY] ]]; then | |
| echo "Deletion cancelled." >&2 | |
| exit 0 | |
| fi | |
| report_result "Found" | |
| TEMP_SCREENRC="${tmpdir}/temp_cleanup.screenrc" | |
| rm -f "${TEMP_SCREENRC}" | |
| touch "${TEMP_SCREENRC}" | |
| echo "The following builder VMs will be deleted in a new screen session:" >&2 | |
| echo "${INSTANCE_LIST}" >&2 | |
| read -p "Continue with deletion? (y/N): " confirm | |
| if [[ "${confirm}" != [yY] ]]; then | |
| echo "Deletion cancelled." >&2 | |
| exit 0 | |
| fi |
| # Determine Dataproc Image | ||
| if [[ -n "${DATAPROC_IMAGE:-}" ]]; then | ||
| echo "Using image from ENV: ${DATAPROC_IMAGE}" | ||
| elif [[ -f "${CACHE_FILE}" ]] && [[ $(($(date +%s) - $(stat -c %Y "${CACHE_FILE}"))) -lt "${CACHE_TTL_SECONDS}" ]]; then |
There was a problem hiding this comment.
The use of stat -c %Y is not portable to macOS (which uses BSD stat where -c is not supported). Since this script is executed on the developer's workstation, and many developers use macOS, this will cause the script to fail. Using date -r <file> +%s is fully portable and works on both GNU (Linux) and BSD (macOS) systems.
| elif [[ -f "${CACHE_FILE}" ]] && [[ $(($(date +%s) - $(stat -c %Y "${CACHE_FILE}"))) -lt "${CACHE_TTL_SECONDS}" ]]; then | |
| elif [[ -f "${CACHE_FILE}" ]] && [[ $(($(date +%s) - $(date -r "${CACHE_FILE}" +%s))) -lt "${CACHE_TTL_SECONDS}" ]]; then |
…deferred script execution This commit introduces several key improvements to the custom image builder pipeline to enhance debugging, reliability, and extensibility: 1. **VM Retention on Failure for Debugging**: - Modified `startup_script/run.sh` to use an `EXIT` trap (`shutdown_vm`) instead of an inline shutdown at the end of `main`. - The trap checks the exit status and GCE metadata `retain-on-failure`. If the build fails and `retain-on-failure=true` is set, the VM remains running (skips shutdown) to allow live debugging via SSH. - Added `--retain-on-failure` command-line argument to `generate_custom_image.py` (via `args_parser.py` and `shell_script_generator.py`) and enabled it by default in `examples/secure-boot/pre-init.sh`. 2. **Prevent Storage Exhaustion in Monolithic Builds**: - Increased the build disk size for `2.3-debian12` from 50GB to 100GB in `examples/secure-boot/pre-init.sh` to accommodate the cumulative size of GPU/ML libraries and monolithic optional components (Zeppelin, Jupyter, Docker, Pig, Delta). 3. **Dynamic Deferred Config Script Execution**: - Added support in `startup_script/gce-proxy-setup.sh` to dynamically download and execute a custom user script from a GCS URI (specified via `deferred-config-script-uri` metadata) during the deferred boot phase. - Supports optional integrity verification using a SHA256 checksum (specified via `deferred-config-script-sha256` metadata). 4. **QoL and Portability Fixes**: - Updated ARM instance machine type to `c4a-standard-2` in `examples/secure-boot/pre-init.sh`. - Replaced Linux-specific `stat` command with portable `date -r` in `examples/secure-boot/bin/create-debug-vm.sh` for cache TTL checks. TAG=agy CONV=b274b565-1bd6-43f1-b4db-31f3d89d087b
472b57f to
d7a7d3a
Compare
Update the Secure Boot custom image builder to support Arm architectures, add an interactive debugging workflow, configure Secure Web Proxy (SWP), update path resolution, and sanitize the public .gitignore.
1. Arm Architecture Support
2.2-ubuntu22-armand2.3-ubuntu22-armimage chains.create_arm_instanceto uset2a-standard-2GCE instances for Arm builds.secure-proxyinstead ofproxy-tffor optional components (Docker, Jupyter, Zeppelin, Pig) on Arm.2. Interactive Debugging Workflow
customize-in-screen.shworkflow for running customization on persistent debug VMs.screenon the VM before running the disk usage monitor inno-customization.sh.sort | tail -n1to handle multiple directories.3. Proxy & Network Integration
SWP_IP,SWP_PORT, andPROXY_CERT_GCS_PATHthrough the Podman runner to the builder VM.env.json.startup_script/gce-proxy-setup.shas the authoritative proxy configuration script.4. Code Robustness & Sanitization
env.shto resolveDATAPROC_EVOLUTION_DIRacross nested git boundaries.run_gcloudto capture and return exit codes underset -e.build-current-images.shwhen no logs match the timestamp..gitignoreto remove development scratchpads, local draft scripts, and specific file patterns, replacing them with generic rules for keys, logs, and temporary directories.TAG=agy
CONV=b274b565-1bd6-43f1-b4db-31f3d89d087b