fix: correct runId JSON key mismatch in build polling by ptone · Pull Request #247 · ptone/scion

ptone · 2026-06-12T04:38:44Z

Summary

Fixes the blocker bug from PR Scion/harness local build p2 GoogleCloudPlatform/scion#406 where the harness-config build UI hangs indefinitely with no log output
Backend sends runId (camelCase) but frontend was reading run_id (snake_case), so the build run ID was always empty and polling never started
Adds a guard in pollBuildStatus() to reset UI state if the run ID is missing, preventing a permanent spinner

Test plan

Navigate to a harness-config detail page with a Dockerfile
Click "Build Image" → fill in tag → click "Build"
Verify build log output appears and build completes (no more infinite spinner)
Verify build status badge updates correctly on completion

…s and permissions integer

…tion

… README

…oogleCloudPlatform#403) * Add `scion build` command for local harness-config image builds Introduces a top-level `scion build <harness-config-name>` CLI command that builds a container image from a Dockerfile bundled in a harness-config directory. Supports --tag, --base-image, --push, --platform, and --dry-run flags. After a successful build, updates the harness-config's config.yaml image field to reference the built image. Also fixes Dockerfile content hashing: Dockerfiles were previously excluded from ComputeHarnessConfigRevision, so changes to them did not trigger re-sync to the Hub. Removes Dockerfile from the skipBasenames exclusion list. Extracts detectContainerRuntime() from pkg/hub/maintenance_executors.go into a shared pkg/runtime/container.go so both the new build command and the Hub executor can use it. * Address PR review: yaml.Node config update, error handling, dedup settings H1: Replace destructive yaml.Unmarshal/Marshal round-trip with targeted yaml.Node edit for config.yaml image update. Preserves comments, field order, and unknown fields. M1: Handle all os.Stat errors on Dockerfile, not just IsNotExist. M2: Load settings once instead of twice when both --base-image is unset and --push is set. L3: Remove extra blank line in maintenance_executors.go. * Guard against empty harness-config path and non-mapping YAML nodes Add empty-path check after FindHarnessConfigDir to prevent synthetic harness-configs (e.g. 'generic') from resolving Dockerfile against CWD. Verify yaml.MappingNode kind before manipulating doc.Content[0] to handle malformed config.yaml gracefully. --------- Co-authored-by: Scion Agent (harness-local-build-p1) <agent@scion.dev>

* fix: test-login hardening, agent CLI access, and UI nits - M1: Distinguish store.ErrNotFound from transient DB errors in GetUserByEmail — return 500 for unexpected failures instead of silently creating duplicate users. - L1: Add http.MaxBytesReader(4096) body size limit. - L2: Validate email contains "@" before processing. - L3: Track whether displayName was explicitly provided so the default (email) doesn't overwrite an existing user's display name. - Add harness-config subcommands to the agentAllowed map so agents can manage harness configs via the CLI. Fixes 3 (file-browser badge) and 4 (page title) were already resolved in the current codebase. * style: fix gofmt alignment in cli_mode.go * ci: retry after flaky TestPipeline_LogHandlerRegistered port conflict * test: update cli_mode tests to expect harness-config in agentAllowed --------- Co-authored-by: Scion Agent (dev-followup-pr) <agent@scion.dev>

…GoogleCloudPlatform#401) * feat: wire hub OTel metric recorders to Cloud Monitoring The hub's dbmetrics and dispatchmetrics recorders were created with NewDisabled() — all OTel instruments recorded to no-op sinks. This wires them to a real GCP Cloud Monitoring MeterProvider during server startup, lighting up ~20 existing instruments (pg LISTEN/NOTIFY latency, notifications published/delivered/dropped, subscriber lag, dispatch lifecycle, pool stats). New package pkg/observability/hubmetrics provides NewMeterProvider() which creates a PeriodicReader (60s) with the GCP metric exporter. Metric groups (db-notify, db-pool, dispatch, hub-auth, hub-gcp) can be independently disabled via env vars using OTel View Drop(). Graceful degradation: when GCPProjectID is empty or the exporter fails, the hub logs a warning and continues with disabled recorders — identical to the previous behavior. * test: clean up TestIsGroupDisabled table-driven test Use the table's 'want' field directly instead of re-deriving the expected value in the assertion body. * fix: add 10s timeout to hub MeterProvider shutdown Prevents indefinite hang if the GCP metric exporter is unresponsive during server shutdown. * feat: replace hub in-memory counters with OTel instruments (Phase 2) Add OTelMetricsRecorder and OTelGCPTokenMetrics that implement the existing MetricsRecorder and new GCPTokenMetricsRecorder interfaces using OTel instruments for Cloud Monitoring export. Both use a dual-write pattern — OTel instruments for cloud export plus embedded atomic structs for the /api/metrics JSON snapshot endpoint. New metrics: scion.hub.auth.*, scion.hub.registration.count, scion.hub.join.*, scion.hub.rotation.count, scion.hub.dispatch.*, scion.hub.brokers.connected, scion.hub.gcp.token.*, scion.hub.gcp.iam.duration. Closes #240 * fix: address Phase 2 review findings (M1, M2, M3) M1: Add operation attribute to RecordDispatch OTel instruments so dispatch failures can be broken down by operation type. M2: Expand hub-auth metric group to cover all broker lifecycle instruments (registration, join, rotation, brokers, dispatch) — not just scion.hub.auth.*. M3: Gate SetMetrics(otelMetrics) on broker auth being enabled, preventing /api/metrics from showing an all-zeros broker block when auth is disabled. * feat: add pipeline health gauge, export error counter, and structured logging Phase 3 of metrics-delivery: add observability to the agent-side telemetry pipeline so we can confirm it's working end-to-end in production. - Add scion.telemetry.pipeline.status gauge (Int64, value=1) that self-reports via the cloud exporter on a 60s ticker, confirming the pipeline is alive - Add scion.telemetry.export.errors counter with signal and error_type attributes, incrementing on metric/span/log export failures - Add classifyError() to bucket errors into timeout/auth/quota/other - Upgrade credential logging at pipeline startup from log.Info format strings to structured slog.Info with credentials_file, source, project_id, provider, and cloud_configured fields - Add structured slog.Warn when cloud export is not configured, including the env var and well-known path that were checked - Fix slog handler in pkg/sciontool/log to render key=value attributes instead of silently dropping them - Add pipeline_health_test.go with tests for health gauge lifecycle, nil-safe export error recording, and classifyError bucketing * fix: shut down unused TracerProvider and LoggerProvider in initSelfMetrics initSelfMetrics() creates providers via NewProviders() but only needs the MeterProvider. The TracerProvider and LoggerProvider were never shut down, leaking goroutines. Shut them down immediately after creation. * fix(metrics-delivery): address errcheck and staticcheck CI failures - Replace os.Setenv+cleanup with t.Setenv in hubmetrics tests - Wrap standalone os.Unsetenv calls with error checks - Handle Shutdown errors on TracerProvider, LoggerProvider, MeterProvider - Check pipeline.Stop error in test cleanup - Convert switch to tagged form (staticcheck QF1002) - Fix MeterProvider leak on early return in startHealthGauge --------- Co-authored-by: Scion Agent (metrics-phase1-dev) <agent@scion.dev>

…ogleCloudPlatform#405) * fix: suppress commentary (TypeAssistantReply) messages in Discord Add unconditional early return in broker.Publish() to filter TypeAssistantReply before per-channel logic, covering all channels including those with ShowAssistantReply=true from the buggy setup default. Fix the setup default to false for new channel links. Clean up now-dead per-channel isAssistantReply check. * style: run gofmt on skill-related files inherited from main Fix formatting issues in files that were merged from main without gofmt, causing CI Build & Test check to fail. * fix: move commentary filter before webhook routing, remove dead branch Move the TypeAssistantReply early return before webhook routing and message formatting to avoid computing text that gets immediately discarded. Remove the now-unreachable TypeAssistantReply branch from the useWebhook condition. --------- Co-authored-by: Scion Agent (discord-commentary-filter-dev) <agent@scion.dev>

* Add hub build executor and web UI for harness-config images Phase 2: adds BuildHarnessConfigImageExecutor to build container images from a harness-config's bundled Dockerfile, with web UI for triggering builds, monitoring progress, and viewing logs. Includes all review fixes: - Path traversal containment check in file materialization - Uses shared scionruntime.DetectContainerRuntime() from Phase 1 - Uses hc.Slug (identifier-safe) for image names with Name fallback - Consecutive-error limit (5) in build status polling - Reactive checkbox binding (?checked) in build dialog - Property binding (.value) for tag input - Auto-scroll build log to bottom on new output * fix: address PR GoogleCloudPlatform#406 review comments --------- Co-authored-by: Scion Agent (harness-local-build-p2-fix) <agent@scion.dev>

gemini-code-assist

Code Review

This pull request introduces several features and refactorings, including a new build command to compile container images from harness configurations, support for running agents with no credentials (--no-auth), and the integration of OpenTelemetry metrics for the Hub and telemetry pipeline. Additionally, the Discord plugin's standalone mode and database-backed account linking have been replaced with an in-memory implementation. Feedback on these changes highlights several improvement opportunities: resolving a potential memory leak in the frontend build polling loop, validating config.yaml structure and image tag formats, checking if secret file paths are directories, de-duplicating required files in the capture-auth configuration, warning on unrecognized auth fields, and restoring test coverage for the refactored Discord link service.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-12T04:41:50Z

+  private stopBuildPolling(): void {
+    if (this.buildPollTimer) {
+      clearTimeout(this.buildPollTimer);
+      this.buildPollTimer = null;
+    }
+  }


If a build status fetch request is in-flight when the component is disconnected, this.buildRunning remains true. When the fetch promise resolves, it will schedule a new setTimeout and continue polling indefinitely in the background, causing a memory and resource leak. Setting this.buildRunning = false inside stopBuildPolling ensures that any pending polling loop is terminated immediately upon disconnect.

Suggested change

private stopBuildPolling(): void {

if (this.buildPollTimer) {

clearTimeout(this.buildPollTimer);

this.buildPollTimer = null;

}

}

private stopBuildPolling(): void {

this.buildRunning = false;

if (this.buildPollTimer) {

clearTimeout(this.buildPollTimer);

this.buildPollTimer = null;

}

}

gemini-code-assist · 2026-06-12T04:41:51Z

+		if len(doc.Content) > 0 && doc.Content[0].Kind == yaml.MappingNode {
+			mapping := doc.Content[0]
+			found := false
+			for i := 0; i < len(mapping.Content)-1; i += 2 {
+				if mapping.Content[i].Value == "image" {
+					mapping.Content[i+1].Value = outputImage
+					found = true
+					break
+				}
+			}
+			if !found {
+				mapping.Content = append(mapping.Content,
+					&yaml.Node{Kind: yaml.ScalarNode, Value: "image"},
+					&yaml.Node{Kind: yaml.ScalarNode, Value: outputImage},
+				)
+			}
+		}


If config.yaml is empty or its root is not a mapping node, the image field update is silently skipped, but the command still prints a misleading success message. It is safer to validate that the root is a mapping node and return an error if it is not.

Suggested change

if len(doc.Content) > 0 && doc.Content[0].Kind == yaml.MappingNode {

mapping := doc.Content[0]

found := false

for i := 0; i < len(mapping.Content)-1; i += 2 {

if mapping.Content[i].Value == "image" {

mapping.Content[i+1].Value = outputImage

found = true

break

}

}

if !found {

mapping.Content = append(mapping.Content,

&yaml.Node{Kind: yaml.ScalarNode, Value: "image"},

&yaml.Node{Kind: yaml.ScalarNode, Value: outputImage},

)

}

}

if len(doc.Content) == 0 || doc.Content[0].Kind != yaml.MappingNode {

return fmt.Errorf("invalid config.yaml: root must be a mapping")

}

mapping := doc.Content[0]

found := false

for i := 0; i < len(mapping.Content)-1; i += 2 {

if mapping.Content[i].Value == "image" {

mapping.Content[i+1].Value = outputImage

found = true

break

}

}

if !found {

mapping.Content = append(mapping.Content,

&yaml.Node{Kind: yaml.ScalarNode, Value: "image"},

&yaml.Node{Kind: yaml.ScalarNode, Value: outputImage},

)

}

gemini-code-assist · 2026-06-12T04:41:51Z

+			info, err := os.Stat(filePath)
+			if err != nil {
+				log.Error("Failed to stat file %s: %v", filePath, err)
+				os.Exit(1)
+			}
+			if info.Size() > 64*1024 {
+				log.Error("File exceeds 64KB limit (%d bytes)", info.Size())
+				os.Exit(1)
 			}


If the resolved filePath is a directory, os.Stat will succeed but the subsequent os.ReadFile will fail. Adding an explicit check for info.IsDir() provides a clearer error message and prevents trying to read a directory as a secret file.

info, err := os.Stat(filePath) if err != nil { log.Error("Failed to stat file %s: %v", filePath, err) os.Exit(1) } if info.IsDir() { log.Error("File %s is a directory", filePath) os.Exit(1) } if info.Size() > 64*1024 { log.Error("File exceeds 64KB limit (%d bytes)", info.Size()) os.Exit(1) }

gemini-code-assist · 2026-06-12T04:41:51Z

+func setAuthConfigField(auth *api.AuthConfig, field, value string) {
+	switch field {
+	case "GoogleAppCredentials":
+		auth.GoogleAppCredentials = value
+	case "OAuthCreds":
+		auth.OAuthCreds = value
+	case "CodexAuthFile":
+		auth.CodexAuthFile = value
+	case "OpenCodeAuthFile":
+		auth.OpenCodeAuthFile = value
+	case "ClaudeAuthFile":
+		auth.ClaudeAuthFile = value
+	}
+}


If a harness configuration contains a typo in the field mapping (e.g., field: ClaudeAuthFil), it will silently fail to set the field on AuthConfig. Adding a default case to print a warning to stderr using the built-in println function makes debugging configuration typos much easier.

Suggested change

func setAuthConfigField(auth *api.AuthConfig, field, value string) {

switch field {

case "GoogleAppCredentials":

auth.GoogleAppCredentials = value

case "OAuthCreds":

auth.OAuthCreds = value

case "CodexAuthFile":

auth.CodexAuthFile = value

case "OpenCodeAuthFile":

auth.OpenCodeAuthFile = value

case "ClaudeAuthFile":

auth.ClaudeAuthFile = value

}

}

func setAuthConfigField(auth *api.AuthConfig, field, value string) {

switch field {

case "GoogleAppCredentials":

auth.GoogleAppCredentials = value

case "OAuthCreds":

auth.OAuthCreds = value

case "CodexAuthFile":

auth.CodexAuthFile = value

case "OpenCodeAuthFile":

auth.OpenCodeAuthFile = value

case "ClaudeAuthFile":

auth.ClaudeAuthFile = value

default:

println("Warning: unrecognized AuthConfig field:", field)

}

}

gemini-code-assist · 2026-06-12T04:41:51Z

+	var creds []credEntry
+	for _, authType := range authMeta.Types {
+		for _, rf := range authType.RequiredFiles {
+			// Entries with empty TargetSuffix (e.g. gcloud-adc) are intentionally
+			// excluded — these credentials come from well-known system paths and don't
+			// use the suffix-based source derivation.
+			if rf.Name == "" || rf.TargetSuffix == "" {
+				continue
+			}
+			fileType := rf.Type
+			if fileType == "" {
+				fileType = "file"
+			}
+			suffix := rf.TargetSuffix
+			if !strings.HasPrefix(suffix, "/") {
+				suffix = "/" + suffix
+			}
+			source := "~" + suffix
+			creds = append(creds, credEntry{
+				Key:    rf.Name,
+				Source: source,
+				Type:   fileType,
+				Target: source,
+			})
+		}
+	}


If multiple auth types in authMeta.Types declare the same required file (e.g., CLAUDE_AUTH), the generated capture-auth-config.json will contain duplicate entries. This causes capture_auth.py to attempt capturing the same secret multiple times, which can lead to duplicate logs or failures if --force is not set. De-duplicating the entries by rf.Name prevents this.

seen := make(map[string]bool) var creds []credEntry for _, authType := range authMeta.Types { for _, rf := range authType.RequiredFiles { // Entries with empty TargetSuffix (e.g. gcloud-adc) are intentionally // excluded — these credentials come from well-known system paths and don't // use the suffix-based source derivation. if rf.Name == "" || rf.TargetSuffix == "" { continue } if seen[rf.Name] { continue } seen[rf.Name] = true fileType := rf.Type if fileType == "" { fileType = "file" } suffix := rf.TargetSuffix if !strings.HasPrefix(suffix, "/") { suffix = "/" + suffix } source := "~" + suffix creds = append(creds, credEntry{ Key: rf.Name, Source: source, Type: fileType, Target: source, }) } }

gemini-code-assist · 2026-06-12T04:41:51Z

+	tag := e.tag
+	if tag == "" {
+		tag = "latest"
+	}
+	if v := params["tag"]; v != "" {
+		tag = v
+	}


The tag parameter is received directly from the HTTP request body and passed to the container build command. If the tag contains invalid characters or starts with a dash, it can cause the build command to fail with confusing errors or cause argument confusion. It is highly recommended to validate that the tag conforms to standard container image tag format (e.g., matching ^[a-zA-Z0-9_][a-zA-Z0-9_.-]{0,127}$).

Suggested change

tag := e.tag

if tag == "" {

tag = "latest"

}

if v := params["tag"]; v != "" {

tag = v

}

tag := e.tag

if tag == "" {

tag = "latest"

}

if v := params["tag"]; v != "" {

tag = v

}

if tag != "" {

for _, r := range tag {

if !((r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z') || (r >= '0' && r <= '9') || r == '_' || r == '.' || r == '-') {

return fmt.Errorf("invalid tag %q: must only contain alphanumeric characters, underscores, periods, and dashes", tag)

}

}

}

gemini-code-assist · 2026-06-12T04:41:51Z

 type DiscordLinkService struct {
-	store store.DiscordPendingLinkStore
+	mu      sync.Mutex
+	pending map[string]*discordPendingLink // code → pending link

 	verifyMu       sync.Mutex
 	verifyLimiters map[string]*tokenBucket // IP → token bucket


The database-backed DiscordPendingLinkStore has been replaced with an in-memory map, and the corresponding test file pkg/hub/discord_link_test.go was deleted. Deleting the test suite entirely reduces test coverage. It is highly recommended to add a new test file to verify the correctness and thread-safety of the new in-memory DiscordLinkService implementation.

…metrics) (GoogleCloudPlatform#407) Clarify the two distinct metric families in Scion: - Infrastructure metrics (scion.hub.*, scion.db.*, scion.dispatch.*) for platform health, produced by the Hub process - Agent metrics (gen_ai.*, agent.*) for harness/model telemetry, produced inside agent containers via the telemetry pipeline Also defines the Telemetry pipeline term. Co-authored-by: Scion Agent (metrics-architect) <agent@scion.dev>

Move the runId validation from pollBuildStatus() to startBuild() where the response is parsed. This prevents entering an invalid polling state and keeps the polling logic clean and focused. Addresses review feedback on PR GoogleCloudPlatform#410.

ptone and others added 11 commits June 11, 2026 12:00

fix(starter-hub): add project flag to gcloud iam service-accounts create

69f80b4

docs(discord): add guide for agent-led installation and setup

76ffa9c

docs(discord): improve agent-led installation with interactive detail…

904c5dc

…s and permissions integer

docs(discord): fix configuration examples and add troubleshooting sec…

790f49c

…tion

docs(discord): refine agent prompt and add configuration checklist to…

bffaef0

… README

harness-auth-flow Phase 2 (GoogleCloudPlatform#404)

8b93b43

gemini-code-assist Bot reviewed Jun 12, 2026

View reviewed changes

ptone and others added 2 commits June 12, 2026 04:41

fix: correct runId JSON key mismatch in build polling

9cea42d

ptone force-pushed the scion/harness-build-fix branch from be90a1a to 9cea42d Compare June 12, 2026 16:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct runId JSON key mismatch in build polling#247

fix: correct runId JSON key mismatch in build polling#247
ptone wants to merge 14 commits into
mainfrom
scion/harness-build-fix

ptone commented Jun 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ptone commented Jun 12, 2026

Summary

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant