Skip to content

Increase tpa-pgsql-bee memory limits to fix OOMKills#23

Merged
rhopp merged 1 commit into
redhat-appstudio:mainfrom
rhopp:fix/increase-tpa-pgsql-memory
Feb 25, 2026
Merged

Increase tpa-pgsql-bee memory limits to fix OOMKills#23
rhopp merged 1 commit into
redhat-appstudio:mainfrom
rhopp:fix/increase-tpa-pgsql-memory

Conversation

@rhopp

@rhopp rhopp commented Feb 25, 2026

Copy link
Copy Markdown
Collaborator

The PostgreSQL pod (tpa-pgsql-bee) is repeatedly OOM-killed because the 25GB database with 50 concurrent connections consistently uses 900-980MB, exceeding the 1Gi limit. This causes SBOM upload failures with "Connection pool timed out" errors.

Increase memory limit from 1Gi to 4Gi and request from 512Mi to 2Gi. Also bump CPU limit/request slightly as observed usage (350m) was already above the old 250m request.

Summary by CodeRabbit

  • Chores
    • Enhanced database infrastructure resource allocation to improve system performance and reliability during peak usage.

The PostgreSQL pod (tpa-pgsql-bee) is repeatedly OOM-killed because
the 25GB database with 50 concurrent connections consistently uses
900-980MB, exceeding the 1Gi limit. This causes SBOM upload failures
with "Connection pool timed out" errors.

Increase memory limit from 1Gi to 4Gi and request from 512Mi to 2Gi.
Also bump CPU limit/request slightly as observed usage (350m) was
already above the old 250m request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Feb 25, 2026

Copy link
Copy Markdown

Walkthrough

Resource allocations for the tpa-pgsql-bee container are increased in the Kubernetes Deployment manifest. CPU limits doubled from 1 to 2, memory limits increased from 1Gi to 4Gi, CPU requests doubled from 250m to 500m, and memory requests increased from 512Mi to 2Gi.

Changes

Cohort / File(s) Summary
Resource Configuration
components/trust-apps/tpa/infrastructure.yaml
Increased tpa-pgsql-bee container resource limits (CPU: 1→2, memory: 1Gi→4Gi) and requests (CPU: 250m→500m, memory: 512Mi→2Gi).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the main change: increasing memory limits for the tpa-pgsql-bee container to resolve OOMKill issues, which aligns with the primary objective of the PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
components/trust-apps/tpa/infrastructure.yaml (1)

82-86: Add post-rollout guardrails to confirm right-sizing.

After deploy, monitor restart count and memory working set for tpa-pgsql-bee for a few days; if stable, consider a VPA recommendation loop to keep requests/limits data-driven over time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/trust-apps/tpa/infrastructure.yaml` around lines 82 - 86, After
deploying the resource with the current cpu/memory limits and requests for
tpa-pgsql-bee, add a post-rollout guardrail: monitor the Pod/Deployment
tpa-pgsql-bee for restartCount and container_memory_working_set_bytes for
several days and create alerting rules (e.g., prometheus alerts) for unusual
restarts or sustained memory pressure; if metrics remain stable for the
observation window, enable a VPA recommendation loop to propose adjustments to
the cpu/memory requests/limits and automate a safe rollout (dry-run VPA or
PR-based changes) to keep size data-driven over time.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@components/trust-apps/tpa/infrastructure.yaml`:
- Around line 82-86: After deploying the resource with the current cpu/memory
limits and requests for tpa-pgsql-bee, add a post-rollout guardrail: monitor the
Pod/Deployment tpa-pgsql-bee for restartCount and
container_memory_working_set_bytes for several days and create alerting rules
(e.g., prometheus alerts) for unusual restarts or sustained memory pressure; if
metrics remain stable for the observation window, enable a VPA recommendation
loop to propose adjustments to the cpu/memory requests/limits and automate a
safe rollout (dry-run VPA or PR-based changes) to keep size data-driven over
time.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5fe63e0 and a857407.

📒 Files selected for processing (1)
  • components/trust-apps/tpa/infrastructure.yaml

@rhopp rhopp merged commit a32086e into redhat-appstudio:main Feb 25, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant