Skip to content

Feat/picod prometheus metrics#400

Open
vanshika2720 wants to merge 1 commit into
volcano-sh:mainfrom
vanshika2720:feat/picod-prometheus-metrics
Open

Feat/picod prometheus metrics#400
vanshika2720 wants to merge 1 commit into
volcano-sh:mainfrom
vanshika2720:feat/picod-prometheus-metrics

Conversation

@vanshika2720

@vanshika2720 vanshika2720 commented Jun 22, 2026

Copy link
Copy Markdown

/kind feature

What type of PR is this?

/kind feature

What this PR does / why we need it:

PicoD currently exposes only a /health endpoint and does not provide any Prometheus-compatible metrics. This makes it difficult to monitor sandbox activity, understand request patterns, or observe command execution behavior.

This PR adds a Prometheus /metrics endpoint to PicoD using the existing prometheus/client_golang dependency already present in the module graph. The implementation introduces a dedicated Prometheus registry and exports metrics focused on request handling and command execution.

Metrics added:

  • picod_active_executions – Gauge tracking in-flight execute requests.
  • picod_execute_requests_total – Counter tracking execute request outcomes (success, error, timeout, invalid).
  • picod_http_requests_total – Counter tracking processed HTTP requests by method, path, and status code.
  • picod_http_request_duration_seconds – Histogram tracking request latency by method and path.

The /metrics endpoint is exposed similarly to /health and is excluded from gzip compression. The implementation also includes middleware-based request instrumentation and unit tests validating metric registration and collection behavior.

Which issue(s) this PR fixes:

Fixes #386

Special notes for your reviewer:

  • Uses a dedicated Prometheus registry instead of the global registry to avoid exposing default Go runtime metrics.
  • No new dependencies are introduced.
  • Metrics focus on PicoD's stateless execution model and avoid introducing artificial session concepts.
  • Added programmatic metric validation tests using Registry.Gather() rather than relying on text exposition output.

Does this PR introduce a user-facing change?:

PicoD now exposes a Prometheus-compatible `/metrics` endpoint that provides request, latency, and command execution metrics for observability and monitoring.

Copilot AI review requested due to automatic review settings June 22, 2026 18:42
@volcano-sh-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kevin-wangzefeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Prometheus metrics to the PicoD server, adding a new /metrics endpoint and middleware to track active executions, request counts, and HTTP latencies. The feedback highlights a potential Denial of Service (DoS) vulnerability due to high cardinality in the metrics middleware when c.FullPath() is empty, and points out an incorrect metric label ("invalid" instead of "error") when directory creation fails with an HTTP 500 error.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread pkg/picod/metrics.go
Comment thread pkg/picod/execute.go Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Prometheus observability to PicoD (the sandbox-side execution daemon) by introducing a dedicated Prometheus registry, a /metrics endpoint, and request/execute instrumentation so operators can monitor request rates, latency, and command execution behavior.

Changes:

  • Introduces pkg/picod/metrics.go with a private Prometheus registry, collectors, and Gin middleware for HTTP request counting/latency.
  • Exposes /metrics in the PicoD Gin server and excludes it from gzip compression.
  • Instruments ExecuteHandler to emit execution outcome counters and an “active executions” gauge; adds unit tests validating metric collection via Registry.Gather().

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pkg/picod/server.go Wires metrics middleware and /metrics endpoint into PicoD’s Gin server and gzip exclusions.
pkg/picod/metrics.go Implements PicoD Prometheus registry, collectors, /metrics handler, and HTTP instrumentation middleware.
pkg/picod/metrics_test.go Adds tests that exercise /metrics, execute a command, and validate gathered metric families.
pkg/picod/execute.go Adds execution-level metric updates (active gauge + outcome counter).
go.mod Promotes prometheus/client_golang to a direct dependency.
go.sum Tidies module checksums accordingly.

Comment thread pkg/picod/server.go
Comment thread pkg/picod/metrics.go
Comment thread pkg/picod/execute.go
Comment thread pkg/picod/execute.go Outdated
Comment thread pkg/picod/metrics_test.go Outdated
Comment thread pkg/picod/metrics_test.go Outdated
@codecov-commenter

codecov-commenter commented Jun 22, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 96.90722% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.49%. Comparing base (524e55e) to head (622a676).
⚠️ Report is 146 commits behind head on main.

Files with missing lines Patch % Lines
pkg/picod/metrics.go 96.42% 1 Missing and 1 partial ⚠️
pkg/picod/execute.go 97.29% 1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #400       +/-   ##
===========================================
+ Coverage   47.57%   59.49%   +11.92%     
===========================================
  Files          30       38        +8     
  Lines        2819     3570      +751     
===========================================
+ Hits         1341     2124      +783     
+ Misses       1338     1237      -101     
- Partials      140      209       +69     
Flag Coverage Δ
unittests 59.49% <96.90%> (+11.92%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vanshika2720 vanshika2720 force-pushed the feat/picod-prometheus-metrics branch from 5dca3fa to 482fcaf Compare June 22, 2026 20:18
Implement a /metrics Prometheus endpoint in PicoD to monitor command
executions, request latency, and HTTP requests.
- Add metrics for active executions and request outcomes.
- Add Gin middleware for path/method/status metrics.
- Exclude /metrics from auth and gzip compression.
- Add unit tests.

Signed-off-by: Vanshika <pahalvanshikaa@gmail.com>
Copilot AI review requested due to automatic review settings June 22, 2026 21:06
@vanshika2720 vanshika2720 force-pushed the feat/picod-prometheus-metrics branch from 482fcaf to 3e4ac99 Compare June 22, 2026 21:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings June 25, 2026 09:04
@vanshika2720 vanshika2720 force-pushed the feat/picod-prometheus-metrics branch from 622a676 to 3e4ac99 Compare June 25, 2026 09:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread pkg/picod/metrics_test.go
@vanshika2720

Copy link
Copy Markdown
Author

@hzxuzhonghu Please review it.

@ranxi2001

Copy link
Copy Markdown
Contributor

I did a local pass over the current head (3e4ac99) and the PicoD package tests pass for me:

  • go test ./pkg/picod -count=1
  • go test -race ./pkg/picod -count=1
  • go test ./pkg/picod -run TestMetrics_Exposition -count=20

One issue still looks worth fixing before merge:

s.metrics.Middleware() is currently registered after maxBodySizeMiddleware() in pkg/picod/server.go. Requests rejected by the body-size middleware return 413 and abort before the metrics middleware runs, so they are missing from picod_http_requests_total and picod_http_request_duration_seconds.

I verified this locally with a temporary test that sends an oversized POST /api/execute request. The response is 413, but there is no picod_http_requests_total{method="POST", path="/api/execute", status_code="413"} sample. Moving the metrics middleware before maxBodySizeMiddleware() makes the same test pass.

Suggested focused test coverage:

  1. Send an oversized POST /api/execute.
  2. Assert the response status is 413.
  3. Assert picod_http_requests_total includes method="POST", path="/api/execute", status_code="413".

Two smaller follow-ups:

  • picod_active_executions currently counts the whole execute handler, including JSON parsing and validation. If the intended metric is active command executions, the gauge should wrap only the cmd.Run() window; otherwise the name/help text should make it clear that this is active execute handler requests.
  • It would be useful to add a regression test for unmatched routes to ensure the path label stays fixed as unmatched and never uses the raw URL path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[v0.2.0] Call for proposals

5 participants