Skip to content

fix(antigravity): capture token from gnome-keyring when file path not written by AGY#288

Open
ptone wants to merge 36 commits into
mainfrom
scion/antigravity-capture-keyring
Open

fix(antigravity): capture token from gnome-keyring when file path not written by AGY#288
ptone wants to merge 36 commits into
mainfrom
scion/antigravity-capture-keyring

Conversation

@ptone

@ptone ptone commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Summary

  • provision.py: Save DBUS_SESSION_BUS_ADDRESS to ~/.scion/harness/.dbus-env after keyring init, so other processes (e.g. the tmux shell tab running capture_auth.py) can access the gnome-keyring
  • capture_auth.py: Add keyring fallback — when the file-based capture finds no token at ~/.gemini/antigravity-cli/antigravity-oauth-token, read from gnome-keyring via secret-tool lookup and store via sciontool secret set

Fixes the no-auth capture flow where AGY writes the OAuth token to gnome-keyring but may not persist it to the expected file path.

Test plan

  • Start an antigravity agent in no-auth mode
  • Complete interactive login via agy
  • Verify ~/.scion/harness/.dbus-env is written with the DBUS address
  • Run capture_auth.py — confirm it falls back to keyring and captures AGY_TOKEN
  • Verify sciontool secret get AGY_TOKEN returns the token

ptone and others added 30 commits June 18, 2026 23:17
GoogleCloudPlatform#443)

Per-project import handlers now support NDJSON streaming when the client
sends Accept: application/x-ndjson, matching the unified endpoint. This
fixes the progress display (per-resource events) and the post-import
summary (the NDJSON done event uses the correct "imported" field name).

Also passes discovered names to executeImport even for the count===1
case, ensuring the import is scoped to the exact resource that was
discovered.

Co-authored-by: Scion Agent (template-import-selector-lead) <agent@scion.dev>
…orm#445)

* Warn when server binary is built without web assets

Add a prominent startup warning when web assets are not embedded and
no --web-assets-dir is provided. Serve the self-contained noAssetsPage
HTML from the static asset handler instead of a plain text 404, so
direct requests to /assets/* also show a helpful message.

* Use slog.Warn for missing web assets startup warning

Switch from log.Printf to slog.Warn so the warning formats correctly
in structured logging environments.

---------

Co-authored-by: Scion Agent (web-assets-warning-lead) <agent@scion.dev>
…Platform#444)

* Sync built image reference back to Hub after scion build

After building a harness config image, the new image tag was written to the
on-disk config.yaml but never synced to Hub. At agent start, Hub hydrated
from its stale stored copy (still pointing at the old upstream image), so
the locally-built image was never used.

Two fixes:

1. cmd/build.go: After updating on-disk config.yaml, auto-sync to Hub via
   syncHarnessConfigToHub(). If Hub is unavailable, print a warning
   suggesting the user run 'scion harness-config push'.

2. pkg/hub/maintenance_executors.go: After a successful build in
   BuildHarnessConfigImageExecutor, update the HarnessConfig record's
   Config.Image field in the DB and re-upload the modified config.yaml
   to Hub storage.

* Fix syncBuiltImage to update file manifest and content hash

After uploading the updated config.yaml to storage, recalculate the
hc.Files manifest entry (size + SHA-256 hash) and the overall
hc.ContentHash before writing the DB record. Without this, the Runtime
Broker's cache-invalidation check sees a stale ContentHash and never
pulls the updated config, and subsequent downloads fail manifest
validation against stale size/hash values.

Also validate that config.yaml's root node is a YAML mapping before
processing, returning an error instead of silently skipping malformed
files.

---------

Co-authored-by: Scion Agent (image-build-fix) <agent@scion.dev>
…atform#448)

Only set channel="web" when the authenticated user's ClientType is
"web", not for CLI or API callers hitting the same endpoint. Also
update stale doc-comment on the stream handler.

Co-authored-by: Scion Agent (msg-cleanups-lead) <agent@scion.dev>
…CloudPlatform#442)

* fix(image-build): use default builder for local docker builds

Using a custom docker-container builder (scion-builder) prevents BuildKit from resolving intermediate images built in previous steps when PUSH is false, because the container driver doesn't share the host's daemon image cache. Switching to the default builder for local builds resolves this.

* fix(image-build): use BUILDX_BUILDER env var to avoid mutating global config

Address PR GoogleCloudPlatform#442 reviewer feedback:
- Use the BUILDX_BUILDER environment variable instead of 'docker buildx use' to prevent mutating the user's global Docker CLI configuration.
- For push builds, export BUILDX_BUILDER="${BUILDX_INSTANCE}" and remove the --use flag from 'docker buildx create'.

---------

Co-authored-by: Preston Holmes <ptone@google.com>
…dPlatform#438)

Bumps [astro](https://github.com/withastro/astro/tree/HEAD/packages/astro) from 6.3.2 to 6.4.8.
- [Release notes](https://github.com/withastro/astro/releases)
- [Changelog](https://github.com/withastro/astro/blob/astro@6.4.8/packages/astro/CHANGELOG.md)
- [Commits](https://github.com/withastro/astro/commits/astro@6.4.8/packages/astro)

---
updated-dependencies:
- dependency-name: astro
  dependency-version: 6.4.8
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…Platform#439)

Bumps [dompurify](https://github.com/cure53/DOMPurify) from 3.4.9 to 3.4.11.
- [Release notes](https://github.com/cure53/DOMPurify/releases)
- [Commits](cure53/DOMPurify@3.4.9...3.4.11)

---
updated-dependencies:
- dependency-name: dompurify
  dependency-version: 3.4.11
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…oudPlatform#440)

Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 4.1.1 to 4.2.0.
- [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md)
- [Commits](https://github.com/nodeca/js-yaml/commits)

---
updated-dependencies:
- dependency-name: js-yaml
  dependency-version: 4.2.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…gleCloudPlatform#441)

Bumps [github.com/rclone/rclone](https://github.com/rclone/rclone) from 1.73.5 to 1.74.3.
- [Release notes](https://github.com/rclone/rclone/releases)
- [Changelog](https://github.com/rclone/rclone/blob/master/RELEASE.md)
- [Commits](rclone/rclone@v1.73.5...v1.74.3)

---
updated-dependencies:
- dependency-name: github.com/rclone/rclone
  dependency-version: 1.74.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* feat: add source_url field to HarnessConfig and Template Ent schemas

Add optional source_url field to track the import origin URL for
harness-configs and templates, enabling reimport/update flows.

* feat: thread source_url through store models, persistence, and import pipeline

Add SourceURL field to store.HarnessConfig, store.Template,
ResourceRecord, and the ent adapter CRUD operations. Update
Bootstrap() to accept sourceURL parameter and persist it on
create/update. Thread sourceURL from resourceDir through the
import worker loop.

* feat: add reimport endpoint, CLI update command, and source_url in show

- POST /api/v1/harness-configs/{id}/reimport endpoint that re-imports
  from stored source_url or an override URL
- CLI: scion harness-config update <name> [--url] [--all]
- Hub client: Reimport() method on HarnessConfigService
- Show command now displays source URL when present

* feat: add Refresh from Source button to harness-config detail page

- Display source URL as clickable link in metadata row
- Show Refresh from Source button when sourceUrl is present
- Button calls POST /api/v1/harness-configs/{id}/reimport
- Reload config details after successful reimport

* fix: add Reimport method to mockHarnessConfigService in tests

The mock was missing the new Reimport method added to the
HarnessConfigService interface, causing CI lint failures.

* fix: remove destructive migration flags from AutoMigrate

Remove WithDropColumn and WithDropIndex from Ent auto-migration.
These flags cause SQLite to silently drop data during table recreation
when schema changes occur between versions, which was the root cause
of URL-imported harness configs disappearing after server restarts.

* fix: add nameFilter param to importFromRemote call after upstream change

The importFromRemote signature gained a nameFilter []string parameter
upstream. Pass nil for the reimport endpoint since it reimports all
resources from the source URL.

* fix(harness): add no_auth config to antigravity harness

Without a no_auth section, selecting "no authentication" for the
antigravity harness causes the container to start with the full
agy-wrapper.sh command (dbus + gnome-keyring + AGY) instead of
dropping to a shell. This fails because AGY has no credentials.

Add no_auth behavior matching claude/gemini/codex/opencode harnesses.

* fix: address code review findings for reimport handler and CLI

C-1: Add user-scope authorization check in reimport handler. Without
this, any authenticated user could reimport another user's harness
config. Also add default-deny for unknown scopes.

M-1: Properly handle malformed JSON in reimport request body instead
of silently discarding parse errors.

H-1/H-2: Gate human-readable printf calls behind !isJSONOutput() in
the CLI update command so --format json produces clean structured
output.

* fix: address Gemini PR review feedback on reimport handler and CLI

- Use r.Body != nil && r.Body != http.NoBody instead of r.ContentLength
  for chunked/HTTP2 compatibility in reimport handler
- Add nil check for harness config after GetHarnessConfig lookup
- Use cmd.Context() instead of context.Background() in update command
- Add nil checks for resp after Hub List calls in both update functions

* fix: nil-check reimport result and return projectPath resolution errors

- Add nil check for result after Reimport() call to prevent panic on
  (nil, nil) return
- Return error when user-provided projectPath fails to resolve instead
  of silently ignoring it

---------

Co-authored-by: Scion Agent (harness-journey-p1-dev) <agent@scion.dev>
…CloudPlatform#450)

* fix: pass Hub-hydrated harness-config path to harness.Resolve

In broker mode, harness.Resolve() used FindHarnessConfigDir which only
searches local disk. Hub-managed harness configs hydrated into temp
directories were invisible to the resolver, causing it to fall back to
a Generic{} harness with no command config. This produced an empty
shell command (sh -c "") and the agent exited immediately.

Add ConfigDirPath to ResolveOptions so the broker can pass the
Hub-hydrated path directly. This ensures the harness config (including
no_auth, command, provisioner) is loaded correctly in broker mode.

* fix: pass Hub-hydrated path to harness.Resolve in ProvisionAgent

Same gap as run.go — provision.go:735 called harness.Resolve without
ConfigDirPath, so container-script harnesses were invisible in broker
mode during the provisioning step.

---------

Co-authored-by: Scion Agent (harness-journey-inv) <agent@scion.dev>
…s fallback (GoogleCloudPlatform#452)

* feat(web): harness-config detail delete, image section, and logs fallback

- Add delete button with confirmation dialog to harness-config detail
  page, matching the existing delete UX from the list view
- Add image status section below file list showing image path (local vs
  remote), last update time, and the Build button (moved from header)
- Add HarnessConfigData type to surface config.image from the API
- Always show the Logs tab on agent detail regardless of cloudLogging
  flag; fall back to broker-based /api/v1/agents/{id}/logs endpoint
  when Cloud Logging is not configured

* fix(hub): add user-scope authz to deleteHarnessConfig and default deleteFiles to false

Add missing user-scope ownership check in the delete handler — previously
user-scoped configs fell through with no authorization. Also deny unknown
scopes explicitly. Default the deleteFiles checkbox to unchecked so users
must opt in to file deletion.

* fix: address Gemini review feedback on harness-config detail

- Use explicit locale options for date formatting consistency
- Simplify delete response check (!response.ok is sufficient)

---------

Co-authored-by: Scion Agent (developer) <agent@scion.dev>
…oogleCloudPlatform#451)

When publishing a skill version, if the draft creation (step 1) succeeds
but upload/finalize (steps 2-3) fail or are interrupted, retrying from
step 1 would hit a UNIQUE constraint and return 409 Conflict, leaving
the user stuck.

Now when a version with the same number already exists as a draft, the
endpoint returns the existing draft with fresh upload URLs instead of
rejecting. Published/deprecated/archived versions still return 409.

Co-authored-by: Scion Agent (skill-bank-publish-fix) <agent@scion.dev>
…udPlatform#456)

* feat(config): add name field to harness-config config.yaml

When importing a harness-config from a repo where the subfolder name
doesn't match the desired config name, the import derives the wrong
name. Add a `name` field to HarnessConfigEntry so config authors can
declare the intended name in config.yaml.

Name resolution priority: CLI --name flag > config.yaml name field >
harness field > URL/directory-derived name.

* fix: validate name field against path traversal and read it in parent import

Sanitize the config.yaml name field in LoadHarnessConfigDir to reject
values containing path separators or traversal sequences (e.g. "../foo").
Also apply the same name-override logic to the parent-directory case in
discoverResourceDirs so each child's config.yaml name is honored during
bulk import, matching the existing leaf-case behavior.

* fix: add path-traversal test and schema pattern for config name field

Add unit test verifying that names like '../evil', 'sub/dir', and '..'
are rejected by LoadHarnessConfigDir. Add a regex pattern constraint
(^[a-zA-Z0-9][a-zA-Z0-9_-]*$) to the name field in the JSON schema
so invalid names are caught at validation time.

* fix: use strings.ContainsAny for platform-independent path traversal validation

Replace filepath.Base-based validation in LoadHarnessConfigDir with
strings.ContainsAny to catch backslash separators on all platforms.
Strengthen install name validation to also reject ".." and paths with
separators.

---------

Co-authored-by: Scion Agent (harness-name-dev) <agent@scion.dev>
…GoogleCloudPlatform#457)

Replace uname -m with Docker's TARGETARCH build arg, which is correct
under cross-compilation and emulation. Also restore explicit tar member
name to only extract the expected binary.

Co-authored-by: Scion Agent (antigravity-fix-dev) <agent@scion.dev>
* feat(skills): combined create + publish flow on skill creation page

Rewrite the skill creation page with Phase 1 of the create-ux-rework:

- SKILL.md content textarea with monospace font and frontmatter template
  placeholder. Supports paste and file upload.
- Client-side YAML frontmatter parsing (js-yaml) with debounced
  auto-populate of name, description, and tags fields.
- Manual edit tracking (editedFields Set) prevents auto-populate from
  overwriting user changes. "from SKILL.md" badges indicate
  auto-populated fields. "Reset from SKILL.md" clears overrides.
- Additional files drop zone with drag-and-drop support. Section header
  changes contextually based on whether SKILL.md content exists.
- Combined create + publish flow: inline progress view with multi-step
  state machine (form → creating → uploading → finalizing → done).
- File uploads use signed-URL pattern with SHA-256 hashing and
  concurrency-4 worker pool, matching the existing publish dialog.
- Per-file upload status indicators and retry support for failed uploads.
- Error recovery: "Retry Failed" for upload errors, "Go to Skill" link
  when create succeeds but publish fails.
- File validation: SKILL.md required for publish, max 50 files, 10 MB
  per file, 50 MB total, 512 KB paste limit.
- Conditional submit buttons: "Create & Publish v{version}" + "Create
  Only" when files present, "Create Skill" when no files.
- Version input defaults to 1.0.0, shown only when files are present.

* fix(skills): address code review feedback on create+publish flow

M1: Fix retryPublish() when version creation (step 2) fails. Track
versionCreated flag; when false, retry re-attempts from step 2 (create
version + upload) instead of falling through to finalize with no draft.

m1: Store redirect setTimeout IDs in redirectTimer property and clear
in disconnectedCallback to prevent stale history pushes on early unmount.

m2: Improve semver regex to allow hyphens in pre-release and optional
+build metadata segments.

m3: Add cleanVersion() helper that strips v-prefix before sending
version to API, not just for validation.

m4: Add @types/js-yaml dev dependency, remove @ts-ignore suppression.

* fix(skills): memoize allFiles, handle network errors, optimize failed-file lookup

- Cache allFiles getter result, invalidate only when skillMdContent or
  additionalFiles change to avoid recreating Blob/File on every render
- Wrap fetch retry paths in try/catch to handle network-level errors
  (DNS, CORS, connection refused) with user-friendly messages
- Use Map for uploadUrl lookup and Set for failed indices to avoid
  O(K*N) iteration in retry paths

* fix(skills): address 4 review findings in skill-create

- Clear skillMdContent and revert textarea on oversized paste
- Check validationError in validateForPublish and validateForCreate
- Clean up uploadFiles retry logic with consistent doUpload helper

---------

Co-authored-by: Scion Agent (p1-web-dev) <agent@scion.dev>
* Add diagnostic logging when pipeline receives data without cloud exporter

handleMetrics(), handleSpans(), and handleLogs() previously returned
silently when the cloud exporter was nil, causing complete data loss with
no diagnostic signal. Add a sync.Once warning on first invocation so
operators can see that telemetry data is being received but dropped.

* Resolve GCP project ID from metadata server as fallback

LoadConfig() now queries the GCE metadata server when no explicit
SCION_GCP_PROJECT_ID or credentials file provides a project ID.
The compute/metadata package respects GCE_METADATA_HOST, so this
transparently reaches the scion metadata emulator (localhost:18380)
in agent containers. A 2-second timeout prevents blocking startup
when no metadata server is reachable.

This fixes the primary blocker where the pipeline cloud exporter
failed with "GCP project ID is required" on instances like
scion-integration2 that have a working metadata server but no
explicit credentials file or env var.

* Route hook metrics through pipeline to prevent sampling rate violations

Each sciontool hook invocation previously created its own Cloud Monitoring
exporter, causing ~50% of metric writes to be rejected when hooks fired
within Cloud Monitoring's minimum sampling period.

Two changes fix this:
1. In GCP mode, short-lived processes (hooks, batch=false) now send
   metrics via OTLP to the local pipeline receiver instead of directly
   to Cloud Monitoring. This mirrors how logs already work.
2. The pipeline buffers incoming metrics and flushes to Cloud Monitoring
   every 15 seconds, consolidating rapid writes into safe batches.

Long-lived processes (init, batch=true) continue exporting directly
to Cloud Monitoring since they maintain stable cumulative counters.

* Register ModelResponse hook to enable token metrics from Claude Code

Claude Code's ModelResponse hook event carries per-turn token counts
(input_tokens, output_tokens, cached_tokens) but was not registered
in the embedded settings.json. The dialect parser and metric recording
code already handle this event type — only the registration was missing.

This enables gen_ai.tokens.input, gen_ai.tokens.output, and
gen_ai.tokens.cached metrics to be recorded from each model turn.
Also updates MaxModelCalls capability to SupportYes since model-end
events are now emitted.

* Skip metadata server query in tests to avoid unnecessary network calls

Move the testing.Testing() check before the metadata query instead of
after, so tests never make a network call to the metadata server only
to have the result discarded by the defense-in-depth guard.

* Update capability test to match ModelResponse hook registration

The Claude Code harness now supports MaxModelCalls (SupportYes) since
ModelResponse hooks are registered. Update the test expectation.

* Deduplicate buffered metrics before export to Cloud Monitoring

When multiple hook processes report the same cumulative counter (e.g.
agent.tool.calls with tool=Read) within the 15-second buffer window,
their data points conflict in Cloud Monitoring's sampling rate check.

The pipeline now deduplicates by (metric name, attribute set) before
exporting, keeping only the latest data point per combination. This
eliminates the remaining sampling-rate violations that the buffering
alone didn't solve.

* Remove metadata server fallback for GCP project ID resolution

The GCP project ID for metrics should be derived from the injected
telemetry service account credentials file, not the metadata server.
The metadata server returns the infrastructure project (where the VM
runs), which may differ from the intended metrics target project
specified in the SA key JSON.

* Enforce minimum interval between metric buffer flushes

Track the last successful export time and skip flushes that occur within
metricFlushInterval (15s) of the previous export. This prevents the
shutdown drain in Pipeline.Stop() from racing with a recent periodic
flush, which caused Cloud Monitoring to reject cumulative data points
written less than 10 seconds apart.

* Add metrics dashboard to hub admin UI

Backend: MetricsDashboardService queries Cloud Monitoring for Scion
telemetry metrics (sessions, API calls, tokens, active agents).
Supports summary, sessions, model-calls, and tokens views with
daily aggregation and 5-minute result caching.

Frontend: Lit page component with Chart.js charts, tabs for each
view, and period selector (7/14/30 days). Wired into admin nav
and route system.

* Address code review: fix error handling and label extraction

- Return errors on partial Cloud Monitoring query failures instead of
  silently caching empty results. Only cache fully successful responses.
  Partial data is still returned with an X-Metrics-Warning header.
- Extract the specific requested label key from groupByLabel instead of
  iterating over arbitrary map entries. Prevents mislabeling when series
  contain multiple metric labels.

* Fix ALIGN_SUM → ALIGN_DELTA for cumulative metrics

Cloud Monitoring requires ALIGN_DELTA (not ALIGN_SUM) for CUMULATIVE
counter metrics. ALIGN_SUM is only valid for GAUGE metrics.

* feat: move metrics dashboard from admin to main nav with relaxed auth

Phase 1 of metrics dashboard refactoring:
- Rename admin-metrics.ts → metrics-dashboard.ts, update element tag
  from scion-page-admin-metrics to scion-page-metrics
- Add /metrics route to main ROUTES, keep /admin/metrics as backward-compat redirect
- Move "Metrics" nav item from Admin section to Management section in sidebar
- Relax backend authorization from admin-only to all authenticated users
- Register new /api/v1/metrics/ endpoint, keep legacy /api/v1/admin/metrics-dashboard
- Update app-shell PAGE_TITLES
- Remove scion-page-admin-metrics from ADMIN_ROUTES set

* feat: add per-project metrics views with parameterized queries

Phase 2 of metrics dashboard refactoring:

Backend:
- Add QueryOption/WithProjectID functional options pattern for query parameterization
- Extend all query methods (QuerySummary, QuerySessions, QueryModelCalls, QueryTokens)
  to accept ...QueryOption and thread project_id filter through low-level query functions
- Add extraFilter parameter to queryGroupedTimeSeries, queryUniqueLabels,
  queryDailyUniqueCount for project-scoped Cloud Monitoring queries
- Add handleProjectMetricsDashboard handler with project view authorization
- Register /projects/:id/metrics sub-route in handleProjectRoutes
- Filter on metric.labels.grove_id (canonical label from GCP exporter)

Frontend:
- Add projectId property to metrics-dashboard.ts component
- Parse projectId from URL on connectedCallback (/projects/:id/metrics pattern)
- Route API calls to project-scoped endpoint when projectId is set
- Update header title to show "Project Metrics" when project-scoped
- Add /projects/:id/metrics route in main.ts (before catch-all project route)
- Add "Project Metrics" page title pattern in app-shell.ts
- Add "Metrics" button in project-detail.ts header-actions

* feat: add metrics summary row to project detail page

Phase 3 of metrics dashboard refactoring:

Backend:
- Add ProjectMetricsSummary struct with 24h scalar counters
  (sessions, API calls, tokens, active agents)
- Add QueryProjectSummary method with lightweight scalar-aggregate queries
- Add handleProjectMetricsSummary handler with project view authorization
- Register /projects/:id/metrics-summary sub-route (before /metrics to avoid
  prefix-match collision)
- Return {available: false} when metrics service is not configured

Frontend:
- Add metricsSummary state variable to project-detail.ts
- Fetch summary in loadData() as non-blocking parallel call
- Render metrics stats row below existing stats row when data available
- Include "View Details" link to /projects/:id/metrics
- Gracefully hide metrics row when unavailable (no error states)
- Add formatTokenCount helper for human-readable token numbers

* fix: address code review findings for metrics dashboard

- Fix H-1: Remove dead /dashboard suffix from frontend API path — use
  /api/v1/metrics/ directly to match backend registration
- Fix H-2: Add explicit code comment on handleAdminMetricsDashboard
  documenting the intentional auth relaxation on the legacy endpoint
- Fix M-1: Mark unused subPath parameter with _ in handleProjectMetricsDashboard
- Fix M-2: Use cacheKeySuffix() helper to avoid trailing colon in cache keys
  when ProjectID is empty (global queries)

* Add graph-up icon to Shoelace icon allowlist

The metrics dashboard nav item uses the graph-up Bootstrap Icon but it
was missing from the USED_ICONS array in the copy script, so it was
never copied to the public assets directory.

* Address PR GoogleCloudPlatform#458 review findings: shutdown race, dedup stability, sort order

Pipeline (pipeline.go):
- Add sync.WaitGroup to coordinate metric flush goroutine shutdown,
  preventing race between final flush and ticker-driven loop
- Add force parameter to flushMetricBuffer to bypass interval check
  during shutdown, preventing loss of buffered metrics
- Sort OTel attributes by key in attrSetKey to ensure stable
  deduplication keys regardless of protobuf attribute ordering

Backend (metrics_dashboard.go):
- Sort queryGroupedTimeSeries results by label for deterministic API
  responses and stable chart legend ordering
- Sort queryDailyUniqueCount results by timestamp for chronological
  ordering

Frontend (metrics-dashboard.ts):
- Move chart.destroy() inside requestAnimationFrame callback to prevent
  overlapping Chart.js instances during rapid tab switching

* Fix CI failures: gofmt formatting and errcheck in metrics dashboard

Run gofmt on metrics_dashboard.go (const block alignment) and
server.go (struct field alignment). Handle the error return from
metricsDashboard.Close() during shutdown to satisfy errcheck.

* Use project_id label instead of grove_id in metrics filter

The compat-literals CI check prohibits new grove terminology. The
telemetry exporter emits both grove_id and project_id labels, so
filtering on project_id is correct and follows project vocabulary.

---------

Co-authored-by: Scion Agent (metrics-validation-lead) <agent@scion.dev>
…latform#459)

Add a "Capture Auth" button visible only for no-auth running agents with
a resolved harness. The button calls POST /api/v1/agents/{id}/exec to
run capture_auth.py inside the container, handling exit codes 0 (success),
2 (not authenticated yet), and others (error).

Co-authored-by: Scion Agent (capture-auth-ui-dev) <agent@scion.dev>
…uth (GoogleCloudPlatform#460)

Replace gnome-keyring/DBUS token injection with direct file-based OAuth
token placement. The token is now written to
~/.gemini/antigravity-cli/antigravity-oauth-token by provision.py,
which AGY reads directly — no keyring daemons needed.

Changes:
- Dockerfile: remove dbus-x11, gnome-keyring, libsecret-1-0, libsecret-tools
- config.yaml: rename AGY_KEYRING_TOKEN→AGY_TOKEN (file secret), require
  GOOGLE_CLOUD_PROJECT + GOOGLE_CLOUD_LOCATION for vertex-ai, update
  capabilities to reflect file-based auth
- provision.py: remove _parse_env_output, keyring/DBUS wrapper logic;
  write token file directly; update _select_auth_method signature to
  accept secret_files
- capture_auth.py: new file-based capture script (standard pattern)
  replacing keyring extraction

Co-authored-by: Scion Agent (antigravity-nokey-dev) <agent@scion.dev>
…udPlatform#461)

AGY requires gnome-keyring to be initialized before it will write
~/.gemini/antigravity-cli/antigravity-oauth-token. The previous commit
removed keyring code, breaking both the no-auth capture flow and normal
auth flow. Restore keyring packages, DBUS/keyring initialization in the
wrapper script, and token injection via secret-tool — but keep the
AGY_TOKEN rename (not reverting to AGY_KEYRING_TOKEN).

Co-authored-by: Scion Agent (antigravity-keyring-restore) <agent@scion.dev>
…h GCP env (GoogleCloudPlatform#462)

The wrapper script's GCP gate only triggers on the enterprise marker
file or AGY_USE_GCP env var. For oauth-token auth with GOOGLE_CLOUD_PROJECT
injected into the container, neither condition is met so the gcp block
is never written to settings.json.

Add an elif to also set _use_gcp=true when GOOGLE_CLOUD_PROJECT is
present. The inner block already guards against empty project values.

Co-authored-by: Scion Agent (antigravity-gcp-dev) <agent@scion.dev>
GoogleCloudPlatform#463)

When config.yaml specifies an image field (e.g. "scion-cogo:latest"),
use its name portion as the output image base name instead of always
using the harness-config CLI argument. This ensures the built image
matches the intended name from the config rather than the directory name.

Co-authored-by: Scion Agent (build-image-name-dev) <agent@scion.dev>
GoogleCloudPlatform#464)

When config.yaml specifies an image field (e.g. "scion-cogo:latest"),
use its name portion as the output image base name instead of always
using the harness-config CLI argument. This ensures the built image
matches the intended name from the config rather than the directory name.

Co-authored-by: Scion Agent (build-image-name-dev) <agent@scion.dev>
…eCloudPlatform#465)

File-type secrets in scion are bind-mounted directly to their target
path (~/.gemini/antigravity-cli/antigravity-oauth-token), not staged
to the env secret directory (~/.scion/harness/secrets/AGY_TOKEN).

Add fallback reads to the bind-mounted target path in three places:
- _select_auth_method: detect token presence for method selection
- _provision: read token value for JSON validation
- agy-wrapper.sh: inject token into keyring at runtime

Co-authored-by: Scion Agent (antigravity-auth-fix-dev) <agent@scion.dev>
* Add PROJECT_SLUG variable and Agent Registry integration test

Add PROJECT_SLUG as a trusted lifecycle hook variable so hooks can
construct per-agent A2A Bridge URLs for Google Cloud Agent Registry
integration. The variable is populated from the project's slug field
in the store.

Add an integration test that validates the full Agent Registry flow:
agent transitions to running → hook POSTs an A2A agent card to the
Agent Registry API with the specific per-agent A2A Bridge endpoint
URL → agent stops → hook DELETEs the registration.

Also adds Agent Registry integration example to lifecycle hooks docs.

* Use distinct Name and Slug in executor render vars test

Addresses QA nit: the test previously used the same value for both
project Name and Slug, which couldn't distinguish between reading
project.Name vs project.Slug. Now uses "Test Project Display Name"
for Name and "test-project-slug" for Slug.

* Fix ent adapter VerificationStatus and update Agent Registry body format

- Fix entGCPToStore() to derive VerificationStatus from Verified boolean,
  which was causing lifecycle hooks to reject verified SAs as unverified
  when using the ent/Postgres store backend.
- Update Agent Registry integration test and docs to use the proto-based
  A2A agent card format with supportedInterfaces, skills, and provider
  fields (protocolBinding: JSONRPC, protocolVersion: 0.3).

* Allow project slug in listAgents projectId query parameter

The A2A Bridge passes the project slug (e.g. "global") as the projectId
query parameter when listing agents, but the Hub expected a UUID. Now
the listAgents handler detects non-UUID values and resolves them via
GetProjectBySlug before filtering.

---------

Co-authored-by: Scion Agent (agent-registry-integration-lead) <agent@scion.dev>
GoogleCloudPlatform#468)

When config.yaml specifies an image field (e.g. "scion-cogo:latest"),
use its name portion as the output image base name instead of always
using the harness-config CLI argument. This ensures the built image
matches the intended name from the config rather than the directory name.

Co-authored-by: Scion Agent (build-image-name-dev) <agent@scion.dev>
…eCloudPlatform#469)

File-type secrets in scion are bind-mounted directly to their target
path (~/.gemini/antigravity-cli/antigravity-oauth-token), not staged
to the env secret directory (~/.scion/harness/secrets/AGY_TOKEN).

Add fallback reads to the bind-mounted target path in three places:
- _select_auth_method: detect token presence for method selection
- _provision: read token value for JSON validation
- agy-wrapper.sh: inject token into keyring at runtime

Co-authored-by: Scion Agent (antigravity-auth-fix-dev) <agent@scion.dev>
…elope (GoogleCloudPlatform#471)

AGY stores refresh_token nested under a "token" key with an
"auth_method" envelope, but the validation only checked the top level.
Accept both layouts in provision.py and capture_auth.py.

Co-authored-by: Scion Agent (antigravity-token-fmt-dev) <agent@scion.dev>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a telemetry metrics dashboard querying Google Cloud Monitoring, adds support for tracking and re-importing harness-configs and templates from remote source URLs, and updates the Antigravity harness to use file-based OAuth tokens with an interactive credential capture flow. It also adds PROJECT_SLUG to trusted lifecycle hook variables, enables combined create-and-publish flows for skills, and implements metric batching and deduplication in the telemetry pipeline to prevent sampling-rate violations. Review feedback focuses on addressing potential nil pointer dereferences in metric merging and metrics querying, correcting image tag parsing when registry ports are present, ensuring proper environment variable fallback for AGY_TOKEN during provisioning, and validating JSON types in the credential capture script.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread cmd/build.go
Comment on lines +93 to +100
imageBaseName := harnessConfigName
if hcDir.Config.Image != "" {
name := hcDir.Config.Image
if colonIdx := strings.LastIndex(name, ":"); colonIdx >= 0 {
name = name[:colonIdx]
}
imageBaseName = name
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If the image name contains a registry port but no tag (e.g., localhost:5001/my-image), strings.LastIndex(name, ":") will match the port's colon instead of a tag separator. This results in the image name being incorrectly truncated to just the registry host (e.g., localhost).

To prevent this, ensure that the matched colon appears after the last slash / in the image path.

Suggested change
imageBaseName := harnessConfigName
if hcDir.Config.Image != "" {
name := hcDir.Config.Image
if colonIdx := strings.LastIndex(name, ":"); colonIdx >= 0 {
name = name[:colonIdx]
}
imageBaseName = name
}
imageBaseName := harnessConfigName
if hcDir.Config.Image != "" {
name := hcDir.Config.Image
if colonIdx := strings.LastIndex(name, ":"); colonIdx >= 0 {
if slashIdx := strings.LastIndex(name, "/"); slashIdx < colonIdx {
name = name[:colonIdx]
}
}
imageBaseName = name
}

Comment on lines +79 to +86
has_refresh = (
"refresh_token" in obj
or (isinstance(obj.get("token"), dict) and "refresh_token" in obj["token"])
)
if not isinstance(obj, dict) or not has_refresh:
print("capture-auth: token file missing refresh_token field", file=sys.stderr)
return False
return True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If obj is not a dictionary (e.g., it is None or a list), accessing obj.get("token") or checking "refresh_token" in obj will raise an AttributeError or TypeError before the isinstance(obj, dict) check is reached on line 83.

Validate that obj is a dictionary first before checking for the presence of the refresh token.

Suggested change
has_refresh = (
"refresh_token" in obj
or (isinstance(obj.get("token"), dict) and "refresh_token" in obj["token"])
)
if not isinstance(obj, dict) or not has_refresh:
print("capture-auth: token file missing refresh_token field", file=sys.stderr)
return False
return True
if not isinstance(obj, dict):
print("capture-auth: token file is not a JSON object", file=sys.stderr)
return False
has_refresh = (
"refresh_token" in obj
or (isinstance(obj.get("token"), dict) and "refresh_token" in obj["token"])
)
if not has_refresh:
print("capture-auth: token file missing refresh_token field", file=sys.stderr)
return False
return True

home: str,
) -> tuple[str, str]:
has_keyring = "AGY_KEYRING_TOKEN" in env_keys
has_token = "AGY_TOKEN" in secret_files or bool(_read_secret(secret_files, "AGY_TOKEN"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If AGY_TOKEN is passed as an environment variable, it is detected and added to env_keys by _present_env_keys, but _select_auth_method currently ignores env_keys when checking for AGY_TOKEN. This breaks environment-variable-based token injection.

Include a check for "AGY_TOKEN" in env_keys.

Suggested change
has_token = "AGY_TOKEN" in secret_files or bool(_read_secret(secret_files, "AGY_TOKEN"))
has_token = "AGY_TOKEN" in env_keys or "AGY_TOKEN" in secret_files or bool(_read_secret(secret_files, "AGY_TOKEN"))

Comment on lines +607 to 608
token_raw = _read_secret(secret_files, "AGY_TOKEN")
if not token_raw:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If AGY_TOKEN is passed as an environment variable, _read_secret will return an empty string, causing the validation to fail and the provisioner to exit with an error. Fall back to reading from os.environ if the secret file is not present.

Suggested change
token_raw = _read_secret(secret_files, "AGY_TOKEN")
if not token_raw:
token_raw = _read_secret(secret_files, "AGY_TOKEN") or os.environ.get("AGY_TOKEN")
if not token_raw:

Comment on lines +473 to +479
func mergeMetricDataPoints(a, b *metricpb.Metric) *metricpb.Metric {
aSum, aOK := a.Data.(*metricpb.Metric_Sum)
bSum, bOK := b.Data.(*metricpb.Metric_Sum)
if !aOK || !bOK {
// For non-Sum metrics, keep the one with the latest timestamp
return b
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If either aSum.Sum or bSum.Sum is nil, accessing aSum.Sum.DataPoints or bSum.Sum.DataPoints later in the function will cause a nil pointer dereference panic. Add defensive nil checks for aSum.Sum and bSum.Sum before proceeding.

Suggested change
func mergeMetricDataPoints(a, b *metricpb.Metric) *metricpb.Metric {
aSum, aOK := a.Data.(*metricpb.Metric_Sum)
bSum, bOK := b.Data.(*metricpb.Metric_Sum)
if !aOK || !bOK {
// For non-Sum metrics, keep the one with the latest timestamp
return b
}
func mergeMetricDataPoints(a, b *metricpb.Metric) *metricpb.Metric {
aSum, aOK := a.Data.(*metricpb.Metric_Sum)
bSum, bOK := b.Data.(*metricpb.Metric_Sum)
if !aOK || !bOK || aSum.Sum == nil || bSum.Sum == nil {
// For non-Sum metrics or if Sum is nil, keep the one with the latest timestamp
return b
}

Comment on lines +211 to +212
with os.fdopen(fd, "w") as f:
f.write(token)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Specify encoding="utf-8" when opening the temporary file to ensure consistent behavior across different platforms and environments.

Suggested change
with os.fdopen(fd, "w") as f:
f.write(token)
with os.fdopen(fd, "w", encoding="utf-8") as f:
f.write(token)

Comment on lines +386 to +388
for _, p := range ts.GetPoints() {
total += p.GetValue().GetInt64Value()
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Defensively check if p.GetValue() is nil before calling GetInt64Value() to prevent potential nil pointer dereference panics.

		for _, p := range ts.GetPoints() {
			if val := p.GetValue(); val != nil {
				total += val.GetInt64Value()
			}
		}

…rm#470)

* skill-bank M5: address Gemini code review findings

- Replace context.Background() with cmd.Context() in CLI commands (Ctrl+C support)
- Use pointer fields in UpdateSkillRegistryRequest for proper partial updates
- Make URI scheme detection case-insensitive (RFC 3986 compliance)
- Make GitHub URI parsing case-insensitive
- Use store constants instead of hardcoded strings in skill federation
- Return lowercase scheme from SkillURIScheme
- Fix PinnedHashes nil check to allow clearing pinned hashes

* skill-bank M5: fix URI normalization and client-side pointer fields

* skill-bank: fix empty-string updates for AuthToken and ResolvePath in store

* skill-bank: fix gofmt import ordering in handlers.go

---------

Co-authored-by: Scion Agent (skill-bank) <agent@scion.dev>
@ptone ptone force-pushed the scion/antigravity-capture-keyring branch from 3fac37c to 46d5080 Compare June 22, 2026 20:49
ptone and others added 2 commits June 22, 2026 21:52
…ten by AGY (GoogleCloudPlatform#477)

After no-auth login, AGY writes the OAuth token to gnome-keyring but may
not persist it to ~/.gemini/antigravity-cli/antigravity-oauth-token. The
capture_auth.py script only checked for the file, failing with "source
not found".

Two changes:
- provision.py: save DBUS_SESSION_BUS_ADDRESS to ~/.scion/harness/.dbus-env
  after keyring init so other processes can access the keyring
- capture_auth.py: add keyring fallback via secret-tool lookup when
  file-based capture finds nothing

Co-authored-by: Scion Agent (antigravity-capture-fix-dev) <agent@scion.dev>
…orm#474)

* feat(skill-bank): add DeleteSkillVersion to store interface

Add DeleteSkillVersion method for cleaning up draft version records
when a multipart upload fails mid-way. Only draft versions can be
deleted — published/deprecated/archived versions are protected.

* feat(hub): add multipart upload handler for skill version publishing

Add publishSkillVersionMultipart to handle inline file uploads via
multipart/form-data, enabling single-request skill version publishing
without the two-phase signed-URL flow.

The handler validates semver versions, enforces file constraints (max 50
files, 10MB per file, 50MB total, SKILL.md required), sanitizes
filenames against path traversal, uploads files concurrently to storage,
computes content hashes, and atomically publishes the version.

Content-type dispatch in publishSkillVersion routes multipart requests
to the new handler while preserving the existing JSON flow.

* fix(hub): wire DeleteSkillVersion into multipart upload error path

Replace the TODO placeholder with the actual DeleteSkillVersion call
now that the store method is available.

* feat(web): simplify skill publish dialog to use multipart POST

Replace the 3-step signed-URL upload flow (create draft → upload files
individually → finalize) with a single multipart POST. Server now
handles hash computation, storage upload, and publishing atomically.

Removes: client-side SHA-256, concurrent upload semaphore, per-file
retry logic, finalize step, per-file status tracking.

* feat(web): add combined create + publish flow to skill create page

Add optional "Publish first version" toggle on the create page that
enables a two-step combined flow: create the skill, then publish the
first version via multipart POST. Replaces the previous N+2 API call
pattern with just two calls.

If publish fails after skill creation, offers "Retry Publishing" and
"Go to Skill" recovery options. Existing create-only flow is unchanged
when the toggle is off.

* fix(hub): clean up draft version on all multipart error paths

Address Gemini review findings: delete draft version when storage is
not configured or when the final UpdateSkillVersion to published fails,
preventing the unique constraint from blocking retries.

* feat(web): simplify skill create page to use multipart POST

Replace the 3-step signed-URL upload flow in the combined create +
publish path with a single multipart POST. Server handles hashing,
storage upload, and publishing atomically.

Removes: client-side SHA-256, concurrent upload semaphore, per-file
retry, finalize step. Preserves: YAML frontmatter parsing, file
selection UI, scope/visibility/tags, create-only flow.

---------

Co-authored-by: Scion Agent (skill-bank-em-review) <agent@scion.dev>
@ptone ptone force-pushed the scion/antigravity-capture-keyring branch from 426f117 to abafc2c Compare June 22, 2026 21:33
ptone and others added 3 commits June 22, 2026 22:34
… exists" as skip (GoogleCloudPlatform#478)

Config may list the same secret key in multiple auth types (e.g. AGY_TOKEN
from both oauth-token and vertex-ai required_files). This caused confusing
output: first iteration succeeds, second fails with "already exists", and
the summary shows both a capture and an error.

Deduplicate entries by key after loading, and treat the "already exists"
sciontool error as a silent skip rather than a reported error.

Co-authored-by: Scion Agent (capture-dedup-dev) <agent@scion.dev>
* Fix 5 review findings from PR GoogleCloudPlatform#467

- Handle all KeyValue types in attrSetKey (bool, double, zero int, empty string)
- Simplify duration alignment to use Truncate instead of float conversion
- Remove raw error messages from X-Metrics-Warning HTTP headers
- Move chart rendering side-effects from render() to updated() lifecycle

* Fix 2 MEDIUM review findings from PR GoogleCloudPlatform#467

---------

Co-authored-by: Scion Agent (metrics-validation-fix-dev) <agent@scion.dev>
When a secret already exists in the vault, _capture_one was returning
(False, None), causing the main loop to count it as neither captured nor
errored. This triggered the keyring fallback, which failed on the same
"already exists" condition and exited with an error.

Return (True, None) instead — already present means already captured.
@ptone ptone force-pushed the scion/antigravity-capture-keyring branch from 2b36cf6 to 90d7e3f Compare June 23, 2026 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants