Skip to content

feat(serverless): implement warm container pool for function reuse#594

Open
poyrazK wants to merge 39 commits into
mainfrom
worktree-serverless-warm-pool
Open

feat(serverless): implement warm container pool for function reuse#594
poyrazK wants to merge 39 commits into
mainfrom
worktree-serverless-warm-pool

Conversation

@poyrazK

@poyrazK poyrazK commented May 18, 2026

Copy link
Copy Markdown
Owner

Summary

Implements per-function warm container/VM pools to eliminate cold starts on repeated function invocations. Containers/VMs are reused across invocations, reducing latency from 2-5s to ~100ms.

Changes

New Files

  • internal/core/ports/pool.go - PoolManager interface, PoolConfig, PoolInstance, PoolStats
  • internal/core/pool/pool.go - PoolManagerImpl, FunctionPool with Acquire/Release/Reaper
  • cmd/firecracker-agent/ - Guest agent skeleton for Firecracker vsock exec

Modified Files

File Change
internal/core/domain/function.go Added PoolConfig struct and field
internal/core/ports/compute.go Added StartPoolInstance/ExecInInstance/GetInstanceReady
internal/core/services/function.go Pool integration with pooled/cold invocation paths
internal/repositories/docker/adapter.go Full pool method implementation
internal/repositories/firecracker/adapter.go VSOCK config + ExecInInstance via vsock
internal/repositories/libvirt/adapter.go Pool methods with serial console exec
internal/repositories/libvirt/libvirt_client.go Added DomainOpenConsole interface
internal/repositories/libvirt/real_client.go DomainOpenConsole implementation
internal/repositories/libvirt/templates.go Serial console + QEMU agent channel in domain XML
internal/repositories/postgres/function_repo.go pool_config persistence

Key Design Decisions

  • Per-function pools: Prevents noisy neighbor issues
  • Fresh code mount: Code extracted/mounted per invocation (simpler than code versioning in pool)
  • Backpressure + scale-out: Wait for availability at MaxSize, otherwise spawn
  • Idle reaping: Background goroutine reaps warm instances idle > MaxIdleTime
  • Docker exec: Uses native ContainerExecCreate/Attach
  • Libvirt exec: Uses serial console via DomainOpenConsole API
  • Firecracker exec: Uses VSOCK device with guest agent

Backend Support Matrix

Backend Pool Support Exec Mechanism
Docker Full ContainerExecCreate/Attach
Libvirt Full Serial console via DomainOpenConsole
Firecracker Full VSOCK + guest agent (requires guest setup)

Test Plan

  • Create function with pool_config: {min_size: 2, max_size: 5}
  • Invoke 5 times rapidly, verify container/VM reuse
  • Verify cold start time drops after first invocation
  • Update function handler, verify old warm containers destroyed
  • Test function delete cleans up pool
  • Test Libvirt warm pool with serial console exec
  • Test Firecracker warm pool with vsock guest agent

Notes

  • Docker: Fully working - exec uses native Docker API
  • Libvirt: Uses serial console - VM cloud-init must expose shell on /dev/ttyS0
  • Firecracker: Uses VSOCK - guest agent must be embedded in rootfs and bridge vsock to Unix socket

Future Enhancements

  • QEMU Guest Agent for Libvirt (more robust than serial)
  • SSH-based exec for Firecracker as alternative to vsock
  • Provisioned concurrency / min instances pre-warming

Summary by CodeRabbit

Release Notes

  • New Features
    • Warm instance pooling: Functions can now maintain pools of pre-initialized instances to reduce cold-start latency.
    • Pool configuration: Configure minimum/maximum instance pool sizes and maximum idle time per function.
    • Enhanced command execution security: Allowlist-based entrypoint validation and pattern-based argument sanitization.

Review Change Stack

Copilot AI review requested due to automatic review settings May 18, 2026 20:41
@coderabbitai

coderabbitai Bot commented May 18, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@poyrazK has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 39 minutes and 10 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: db4de8bb-099d-4360-9ee7-e3cedc3c3c04

📥 Commits

Reviewing files that changed from the base of the PR and between 60d2023 and c11b571.

📒 Files selected for processing (45)
  • .github/workflows/ci.yml
  • .golangci.yml
  • cmd/cloud/container.go
  • cmd/cloud/cron.go
  • cmd/cloud/dns.go
  • cmd/cloud/events.go
  • cmd/cloud/function.go
  • cmd/cloud/function_schedule.go
  • cmd/cloud/gateway.go
  • cmd/cloud/iac.go
  • cmd/cloud/igw.go
  • cmd/cloud/instance.go
  • cmd/cloud/instance_type.go
  • cmd/cloud/kubernetes.go
  • cmd/cloud/loadbalancer.go
  • cmd/cloud/logs.go
  • cmd/cloud/nat_gateway.go
  • cmd/cloud/notify.go
  • cmd/cloud/queue.go
  • cmd/cloud/route_table.go
  • cmd/cloud/secrets.go
  • cmd/cloud/sg.go
  • cmd/cloud/snapshot.go
  • cmd/cloud/storage.go
  • cmd/cloud/storage_lifecycle.go
  • cmd/cloud/subnet.go
  • cmd/cloud/tenant.go
  • cmd/cloud/volume.go
  • cmd/cloud/vpc.go
  • cmd/cloud/vpc_peering.go
  • cmd/firecracker-agent/main.go
  • internal/core/services/identity_hash_test.go
  • internal/core/services/image_unit_test.go
  • internal/csi/driver_test.go
  • internal/handlers/admin_handler_test.go
  • internal/platform/config_test.go
  • internal/repositories/k8s/node_executor_test.go
  • internal/repositories/libvirt/adapter.go
  • internal/repositories/libvirt/adapter_unit_test.go
  • internal/repositories/libvirt/real_client.go
  • internal/repositories/postgres/function_repo_unit_test.go
  • internal/storage/node/store_test.go
  • tests/database_advanced_e2e_test.go
  • tests/gateway_e2e_test.go
  • tests/instance_types_e2e_test.go
📝 Walkthrough

Walkthrough

This PR implements warm container pooling for serverless-style function execution. It introduces a pool manager that maintains warm instances per function, scales based on demand, reaps idle instances, and integrates with function invocation to prefer warm execution over cold starts. The implementation spans pool contracts and manager, command validation, a Firecracker guest agent, backend-specific pool methods (Docker, Firecracker with vsock, libvirt with console), database schema, and service integration.

Changes

Warm Container Pool Implementation

Layer / File(s) Summary
Pool contract types and compute backend extension
internal/core/ports/pool.go, internal/core/ports/compute.go
Pool sizing, instance metadata, stats counters, and PoolManager interface defined; ComputeBackend gains StartPoolInstance, ExecInInstance, GetInstanceReady methods.
Function entity pool support
internal/core/domain/function.go
Function and FunctionUpdate models gain optional PoolConfig field; SetColumns() conditionally includes pool_config in SQL UPDATE.
Pool manager and instance lifecycle
internal/core/pool/pool.go
Manager coordinates pool map and in-flight tracking; FunctionPool maintains warm/busy/starting instances with RWMutex; Acquire implements backpressure, async scaling, polling; release returns-to-pool or destroys on error; reaper deletes idle instances every 30s while enforcing MinSize; Stop gracefully shuts down pools.
Pool manager test suite
internal/core/pool/pool_test.go
Tests verify registration, config validation, acquisition/release transitions, concurrent operations, invalidation, shutdown with in-flight tracking, stats, idle reaping with timing, and error-based destruction.
Command entrypoint and argument validation
internal/platform/command_validator.go, internal/platform/command_validator_test.go
CommandValidator enforces allowlisted entrypoints and detects dangerous argument patterns (traversal, shell operators, substitution); ValidateWithRuntime adds runtime-specific checks; SanitizeArgs removes harmful constructs.
Firecracker guest-side agent and vsock server
cmd/firecracker-agent/main.go
Linux/amd64 agent listens on AF_VSOCK (CID 3, port 3), parses newline-delimited commands, validates via allowlist, executes via exec.Command without shell interpolation, returns output or ERROR responses.
Database schema and function repository pool support
internal/repositories/postgres/migrations/114_add_pool_config.{up,down}.sql, internal/repositories/postgres/function_repo.go
Migration adds pool_config JSONB column; queries select and scan pool_config; Update marshals PoolConfig to JSON.
Docker adapter pool methods
internal/repositories/docker/adapter.go
StartPoolInstance pulls image and creates warm container with tail keep-alive; ExecInInstance runs command via Docker exec; GetInstanceReady reports running state.
Firecracker adapter pool methods and vsock wiring
internal/repositories/firecracker/adapter.go
Config adds GuestCID field (default 3); VM setup configures VsockDevices with host Unix socket; StartPoolInstance launches keep-alive microVM; ExecInInstance dials vsock and reads output with timeout; GetInstanceReady checks machine registry.
Libvirt adapter pool methods, console interface, and domain templates
internal/repositories/libvirt/adapter.go, internal/repositories/libvirt/libvirt_client.go, internal/repositories/libvirt/real_client.go, internal/repositories/libvirt/templates.go
Adapter adds StartPoolInstance (cloud-init domain), ExecInInstance (serial console polling), GetInstanceReady (domain state); libvirt_client adds ConsoleFlags and DomainOpenConsole; real_client implements console via os.Pipe; templates add serial/console/guest-agent channel.
No-op and resilient compute adapter pool methods
internal/repositories/noop/adapters.go, internal/platform/resilient_compute.go, internal/platform/resilient_compute_test.go
NoopComputeBackend returns fixed success values; ResilientCompute delegates pool methods directly to inner backend.
FunctionService pool wiring and pooled invocation
internal/core/services/function.go, internal/api/setup/dependencies.go
NewFunctionService accepts poolMgr; on create/update/delete manages pool registration/invalidation; runInvocation prefers warm pool, falling back to cold start; runPooledInvocation validates command and executes via ExecInInstance; buildTaskOptionsForPool produces pool-specific options.
Service and test infrastructure pool updates
internal/core/services/benchmarks_test.go, internal/core/services/function_unit_test.go, internal/core/services/function_internal_test.go, internal/core/services/mock_compute_test.go
Benchmarks and unit tests construct and wire pool manager; test doubles gain StartPoolInstance/ExecInInstance/GetInstanceReady stubs.
Test double and mock infrastructure updates
internal/handlers/admin_handler_test.go, internal/workers/database_failover_worker_test.go, internal/workers/pipeline_worker_test.go, internal/repositories/libvirt/lb_proxy_test.go, internal/repositories/libvirt/mock_client_test.go
Test mocks updated to implement pool methods.
Linting configuration and error-check suppressions
.golangci.yml, cmd/cloud/container.go, cmd/cloud/cron.go, internal/core/services/identity_hash_test.go, internal/csi/driver_test.go, internal/platform/config_test.go, internal/repositories/firecracker/adapter_test.go
Adds .golangci.yml; test files explicitly suppress errcheck for environment and I/O operations.
Dependency management
go.mod
golang.org/x/sys v0.41.0 promoted to direct dependency.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • poyrazK/thecloud#582: This PR implements the warm container pooling feature requested in the issue, adding per-function pool management with Acquire/Release, idle reaping, and backend-specific execution paths.

Possibly related PRs

  • poyrazK/thecloud#154: Both PRs modify internal/platform/resilient_compute.go and may conflict or build upon each other regarding resilience wrapper method coverage.

Poem

🐰 A pool of warm instances awaits,
No cold starts to delay the gates,
The scheduler reaps what time forgets,
While validators guard the nets,
Pooling warmth for functions yet unmet! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat(serverless): implement warm container pool for function reuse' accurately summarizes the main feature addition—introducing warm container pooling to reduce cold starts.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-serverless-warm-pool

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings May 18, 2026 21:09

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

poyrazK added a commit that referenced this pull request May 19, 2026
- pool.go: fix variable shadowing (pool -> existingPool/newPool)
- pool.go: remove unused deadline variables in waitForWarmInstance/waitWithBackpressure
- libvirt/adapter.go: replace fragile shell-prompt detection with gap-based (1s) exec completion
- postgres: add migration 114 for pool_config column
Copilot AI review requested due to automatic review settings May 19, 2026 12:14

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings May 19, 2026 12:34

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

poyrazK added a commit that referenced this pull request May 19, 2026
Security hardening for PR #594 findings:

1. Firecracker agent: replace shell execution (sh -c) with
   direct exec + command validation. Block dangerous patterns
   (shell operators, path traversal, command substitution).

2. FunctionService: add CommandValidator to validate commands
   before ExecInInstance in pooled invocation path.

3. Add unit tests for pool manager and command validator.

4. Fix mockCompute in resilient_compute_test to implement
   the new pool interface methods (StartPoolInstance,
   ExecInInstance, GetInstanceReady).
Copilot AI review requested due to automatic review settings May 19, 2026 15:55

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings May 19, 2026 16:13

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings May 19, 2026 16:19

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

poyrazK added 2 commits May 21, 2026 14:53
CI was failing because golangci-lint v1.64.8 was being used but the
config file uses version: "2" format. Upgrade CI to v2.3.2.
Copilot AI review requested due to automatic review settings May 21, 2026 12:04

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

poyrazK added 2 commits May 21, 2026 15:13
The function repository was updated to include pool_config column but
the unit tests were not updated to match.
The golangci-lint v2 import path is github.com/golangci/golangci-lint/v2/cmd/golangci-lint
Copilot AI review requested due to automatic review settings May 21, 2026 12:30

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings May 21, 2026 12:52

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings May 21, 2026 13:31

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings May 21, 2026 13:51

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants