Skip to content

y0s3ph/kube-cost-lens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

kube-cost-lens

Lightweight Kubernetes FinOps CLI — analyze resource usage, detect waste, and get actionable rightsizing recommendations without installing anything in your cluster.

License Python 3.11+


The Problem

Kubernetes clusters are notoriously over-provisioned. Teams set generous CPU and memory requests "just in case," and nobody revisits them. The result:

  • 30-60% of requested resources sit idle in a typical cluster.
  • Platform teams lack visibility into which workloads are wasteful.
  • FinOps conversations happen in spreadsheets, disconnected from the actual cluster state.
  • Existing tools (Kubecost, CAST AI, etc.) require in-cluster agents, are commercial, or demand heavy setup.

kube-cost-lens solves this by providing a zero-install, read-only CLI that connects to your cluster, analyzes real usage vs. declared requests, and outputs clear, actionable recommendations.

Core Principles

Principle Description
Zero footprint Nothing gets installed in the cluster. Read-only access via kubeconfig.
Actionable output Every recommendation includes the exact patch (resource values) to apply.
No vendor lock-in Works with any Kubernetes cluster (EKS, GKE, AKS, on-prem, k3s, etc.).
Prometheus-optional Works with the Kubernetes Metrics API by default; Prometheus integration for historical analysis.
CI/CD friendly Machine-readable output (JSON, YAML) for pipeline integration.

Features (Planned)

Phase 1 — Core Analysis

  • Connect to any cluster via kubeconfig / context selection
  • Scan all namespaces (or filtered subset) for Deployments, StatefulSets, DaemonSets, Jobs, CronJobs
  • Compare resource requests and limits vs. actual usage (via Metrics API)
  • Calculate waste score per workload (percentage of requested resources unused)
  • Classify workloads: oversized, undersized, right-sized, no-requests-set, no-limits-set
  • Generate per-workload rightsizing recommendations with suggested values
  • Output: table (terminal), JSON, YAML, CSV

Phase 2 — Namespace & Cluster Aggregation

  • Aggregate waste by namespace with total cost impact estimation
  • Cluster-level summary dashboard in terminal (rich/textual)
  • Support custom cost-per-cpu and cost-per-gb-memory inputs for cost estimation
  • ResourceQuota analysis: how much of the namespace quota is actually consumed
  • Detect namespaces without ResourceQuotas (governance gap)

Phase 3 — Historical Analysis (Prometheus)

  • Connect to Prometheus / Thanos / Victoria Metrics
  • Analyze usage patterns over configurable time windows (7d, 30d, 90d)
  • Detect workloads with periodic spikes (candidates for HPA)
  • Detect workloads with flat low usage (candidates for aggressive downsizing)
  • P95/P99 usage-based recommendations (not just average)
  • Time-series waste trend: is the cluster getting more or less efficient?

Phase 4 — Automation & Integration

  • Generate Kubernetes patches (JSON patch format) ready to apply
  • Generate Kustomize overlays with recommended values
  • CI mode: exit with non-zero code if waste exceeds configurable threshold
  • GitHub Actions integration example
  • Slack/webhook notifications for periodic reports
  • VPA (Vertical Pod Autoscaler) recommendation comparison: kube-cost-lens vs. VPA

Architecture

┌─────────────────────────────────────────────────────────┐
│                    kube-cost-lens CLI                    │
│                                                         │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────────┐  │
│  │   Scanner    │  │   Analyzer   │  │   Reporter     │  │
│  │             │  │              │  │                │  │
│  │ - K8s API   │  │ - Waste calc │  │ - Table        │  │
│  │ - Metrics   │  │ - Classify   │  │ - JSON/YAML    │  │
│  │ - Prometheus│  │ - Recommend  │  │ - CSV          │  │
│  │             │  │ - Score      │  │ - CI exit code │  │
│  └──────┬──────┘  └──────┬───────┘  └───────┬────────┘  │
│         │                │                   │           │
│         └────────────────┴───────────────────┘           │
│                          │                               │
└──────────────────────────┼───────────────────────────────┘
                           │
              ┌────────────┼────────────────┐
              │            │                │
              ▼            ▼                ▼
        ┌──────────┐ ┌──────────┐   ┌─────────────┐
        │  K8s API │ │ Metrics  │   │ Prometheus  │
        │          │ │   API    │   │  (optional) │
        └──────────┘ └──────────┘   └─────────────┘

Component Responsibilities

  • Scanner: Connects to the cluster and collects workload specs (requests, limits) and real-time metrics. Handles kubeconfig, context switching, and namespace filtering.
  • Analyzer: Compares declared resources against actual usage. Calculates waste scores, classifies workloads, and generates rightsizing recommendations using configurable strategies (average, P95, P99).
  • Reporter: Formats the analysis output. Supports multiple formats and handles CI/CD exit codes based on configurable thresholds.

Tech Stack

Component Technology Rationale
Language Python 3.11+ Fast prototyping, rich K8s client ecosystem
K8s client kubernetes (official) Stable, well-maintained, supports all auth methods
Prometheus client prometheus-api-client Lightweight, query-focused
CLI framework Typer Modern, type-hinted, auto-generated help
Terminal UI Rich Beautiful tables, progress bars, dashboards
Package manager uv Fast dependency resolution and virtualenv management
Build system pyproject.toml (hatch/hatchling) Modern Python packaging standard
Testing pytest + pytest-mock Industry standard, good K8s mocking support
Linting Ruff Fast, replaces flake8 + isort + black
Type checking mypy Catch bugs early

Planned CLI Interface

# Basic scan of current cluster context
kube-cost-lens scan

# Scan specific namespaces
kube-cost-lens scan --namespace production --namespace staging

# Exclude system namespaces
kube-cost-lens scan --exclude-namespace kube-system --exclude-namespace cert-manager

# Output as JSON for pipeline consumption
kube-cost-lens scan --output json > report.json

# Set custom cost rates
kube-cost-lens scan --cpu-cost-hour 0.032 --memory-cost-gb-hour 0.004

# Use a specific kubeconfig / context
kube-cost-lens scan --kubeconfig ~/.kube/prod-config --context prod-eu-west-1

# Historical analysis via Prometheus
kube-cost-lens scan --prometheus-url http://prometheus:9090 --window 30d

# CI mode: fail if cluster waste exceeds 40%
kube-cost-lens scan --ci --max-waste-percent 40

# Generate Kubernetes patches
kube-cost-lens recommend --output patches --target-dir ./patches/

# Cluster-level summary
kube-cost-lens summary

Example Output (Planned)

╭─────────────────────────────────────────────────────────────────╮
│                  kube-cost-lens · Cluster Report                │
│                  Context: prod-eu-west-1                        │
│                  Namespaces: 12 scanned                         │
╰─────────────────────────────────────────────────────────────────╯

 Namespace     Workload              CPU Req  CPU Used  Mem Req   Mem Used  Waste   Status
─────────────────────────────────────────────────────────────────────────────────────────────
 production    api-gateway           2000m    340m      4Gi       1.2Gi     72%     ⚠ oversized
 production    auth-service          1000m    780m      2Gi       1.8Gi     14%     ✓ right-sized
 production    payment-worker        500m     45m       1Gi       128Mi     91%     ✗ oversized
 staging       frontend              1000m    12m       2Gi       64Mi      97%     ✗ oversized
 monitoring    prometheus            4000m    1200m     8Gi       5.2Gi     42%     ⚠ oversized
 ...

 Cluster Summary
──────────────────────────────────────
  Total CPU requested:      24.5 cores
  Total CPU used:            8.2 cores    (33%)
  Total Memory requested:   48 Gi
  Total Memory used:        18.4 Gi      (38%)
  Estimated monthly waste:  $847.20

  3 workloads critically oversized (>80% waste)
  5 workloads moderately oversized (40-80% waste)
  4 workloads right-sized (<20% waste)

Project Structure (Planned)

kube-cost-lens/
├── src/
│   └── kube_cost_lens/
│       ├── __init__.py
│       ├── cli.py              # Typer CLI entrypoint and commands
│       ├── scanner/
│       │   ├── __init__.py
│       │   ├── kubernetes.py   # K8s API client, workload discovery
│       │   ├── metrics.py      # Metrics API integration
│       │   └── prometheus.py   # Prometheus/Thanos query client
│       ├── analyzer/
│       │   ├── __init__.py
│       │   ├── waste.py        # Waste calculation engine
│       │   ├── classifier.py   # Workload classification logic
│       │   └── recommender.py  # Rightsizing recommendation engine
│       ├── reporter/
│       │   ├── __init__.py
│       │   ├── table.py        # Rich terminal table output
│       │   ├── json.py         # JSON/YAML serialization
│       │   ├── csv.py          # CSV export
│       │   └── ci.py           # CI mode (exit codes, thresholds)
│       └── models.py           # Pydantic models for workloads, metrics, recommendations
├── tests/
│   ├── conftest.py
│   ├── fixtures/               # Sample K8s API responses
│   ├── test_scanner.py
│   ├── test_analyzer.py
│   └── test_reporter.py
├── examples/
│   ├── github-actions.yml      # Example CI workflow
│   └── sample-report.json      # Example output
├── pyproject.toml
├── LICENSE
└── README.md

Design Decisions

Why not just use VPA recommendations?

VPA (Vertical Pod Autoscaler) is great but has limitations:

  • Requires installation in the cluster (CRDs + controller).
  • Recommendations are per-pod, not aggregated per namespace or cluster.
  • No cost estimation or waste scoring.
  • No CI/CD integration or threshold-based alerts.

kube-cost-lens complements VPA by providing a bird's-eye view with cost context, and can even compare its recommendations against VPA's.

Why read-only / zero-install?

Security-conscious teams (especially in regulated environments) resist installing agents in production clusters. A read-only CLI that runs from a developer's laptop or a CI pipeline removes that friction entirely. The only requirement is a kubeconfig with get and list permissions.

Why Python over Go?

  • Faster iteration for a CLI-focused tool.
  • Excellent Kubernetes client library.
  • Rich ecosystem for terminal UI (Rich, Textual).
  • Lower barrier for contributions from platform/DevOps engineers.
  • Performance is not a bottleneck — the limiting factor is API call latency, not computation.

RBAC Requirements

The tool requires minimal read-only permissions. Example ClusterRole:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-cost-lens-reader
rules:
  - apiGroups: [""]
    resources: ["pods", "namespaces", "resourcequotas"]
    verbs: ["get", "list"]
  - apiGroups: ["apps"]
    resources: ["deployments", "statefulsets", "daemonsets", "replicasets"]
    verbs: ["get", "list"]
  - apiGroups: ["batch"]
    resources: ["jobs", "cronjobs"]
    verbs: ["get", "list"]
  - apiGroups: ["metrics.k8s.io"]
    resources: ["pods"]
    verbs: ["get", "list"]

Related & Prior Art

Tool Comparison
Kubecost Full platform, requires in-cluster install, commercial tiers
CAST AI SaaS, agent-based, broader scope (autoscaling, spot)
kubectl top Real-time only, no recommendations, no aggregation
Goldilocks VPA-based, requires VPA install, namespace-scoped dashboard
Krr Closest alternative — Prometheus-based, Python, good inspiration

kube-cost-lens differentiates by working without any in-cluster dependency, providing CI/CD integration, and generating ready-to-apply patches.

Contributing

Contributions are welcome. Please open an issue to discuss your idea before submitting a PR.

This project follows:

  • Conventional Commits for commit messages.
  • Trunk-based development with short-lived feature branches.
  • All code must pass ruff check, ruff format --check, and mypy before merge.

License

Apache License 2.0

About

Lightweight Kubernetes FinOps CLI — analyze resource usage, detect waste, and get rightsizing recommendations without installing anything in your cluster.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors