Releases: pambrose/prometheus-proxy
Releases · pambrose/prometheus-proxy
v3.2.0
Prometheus Proxy 3.2.0 focuses on security hardening, fail-fast validation, a full end-to-end Testcontainers suite, richer observability, and a major documentation expansion (including a Kubernetes deployment guide).
🚀 New Features
- Pre-shared agent token for authenticating agent gRPC connections —
--agent_token/AGENT_TOKEN(proxy.agentToken/agent.agentToken). The proxy rejects any RPC with a missing or mismatched token (UNAUTHENTICATED, constant-time digest comparison). Empty (the default) preserves the existing open behavior and logs a startup warning unless mutual TLS is configured. The value is never logged. - Per-CA HTTPS trust store for the agent's scrape client —
--https_truststore/--https_truststore_password(HTTPS_TRUST_STORE_PATH/HTTPS_TRUST_STORE_PASSWORD) verify HTTPS targets against a custom/private CA without disabling validation. - Full Testcontainers end-to-end suite (
io.prometheus.containers) — a smoke test plus seven specs over real Netty/Docker (proxy HTTP surfaces, agent-token auth, consolidated merge, chunked + gzipped large payloads, agent reconnect, gRPC TLS, HTTPS targets), plus a parameter-drivenContainersScalingTest. All gated onRUN_CONTAINER_TESTS=true. make help, amake container-teststarget with Docker-context auto-detection, and acontainer-testsGitHub workflow.
🔐 Security
- Mitigates the unauthenticated agent-registration / path-hijacking finding via the optional pre-shared agent token above.
- Prevents agent-supplied service-discovery labels from overriding proxy-computed reserved keys (
__metrics_path__,agentName,hostName). - Redacts query-parameter values (not just
user:pass@userinfo) wherever a scrape URL is logged or echoed to Prometheus, so secrets in?token=…no longer leak. - Derives the agent
HttpClientCachekey from a salted HMAC-SHA256 digest instead of plaintextusername:password. - Bounds the agent's scrape response-body read (reads at most
maxContentLength + 1bytes), closing an OOM path for targets with no/understatedContent-Length.
🔁 Behavior Changes
- Embedded agents (
startAsyncAgent,exitOnMissingConfig=false) now throw the new publicio.prometheus.common.ConfigLoadExceptionon a config-load failure instead of callingexitProcess; standalone agents and the proxy still exit. ProxyOptions,BaseOptions, andAgentOptionsnow validate ports, gRPC timeouts, and scrape/inactivity timeouts at startup, failing fast with clear messages.- Removed the deprecated
alllog level — usetracefor the most verbose output. - Per-request call logging emits at DEBUG instead of INFO when enabled.
📊 Observability
- Added an
outcomelabel toproxy_scrape_request_latency_secondsand now record latency for the timeout and agent-disconnected paths. - Labeled
proxy_start_time_secondswith a per-processlaunch_id. - Count scrape results dropped on connection-close as
agent_scrape_result_count{type="dropped"}.
🐛 Bug Fixes
- Fixed
DnsNameResolverProvider/PickFirstLoadBalancerProvidermissing from the shadedagentJar/proxyJar(Shadow 9.4.2 dropped same-namedMETA-INF/servicesentries), which had made gRPC default to theunixscheme on non-IP hostnames. - Fixed embedded
Agent.stop()/EmbeddedAgentInfo.shutdown()spawning a zombie reconnect thread; shutdown now routes through the Guava lifecycle and blocks until terminated. - Reject registration of multi-segment paths (e.g.
app/metrics) that appeared in service discovery but 404'd at scrape time. - Fixed
appendQueryParamsURL-decoding the encoded query blob before concatenation. - Fixed a per-response processing error blocking the HTTP handler until
scrapeRequestTimeoutSecs; the scrape now fails immediately. - Fixed flaky
ProxyHttpRoutesTestconnection-reset by replacing the TCP probe with an HTTP-level readiness check.
🐳 Docker Images
- The proxy/agent images now run on Java 25 (LTS) via the
eclipse-temurin:25-jrebase (Ubuntu-based, pinned by manifest-list digest), replacing the previousalpine+apk add openjdk17-jrebuild. The published fat JARs remain Java 17 bytecode and run unchanged on the newer JRE, so self-run and embedded-agent usage still requires only Java 17. - Multi-arch coverage is amd64, arm64, s390x, and ppc64le (Temurin is one of the few JDK distributions that publishes the s390x/ppc64le ports).
- The container
ENTRYPOINTadds--enable-native-access=ALL-UNNAMEDand--sun-misc-unsafe-memory-access=allowto silence the JDK 25 startup warnings from jansi and netty.
📚 Documentation
- New Kubernetes deployment guide with ready-to-use proxy/agent manifests, standalone and sidecar patterns, gRPC exposure, and Prometheus Operator (
ServiceMonitor) integration. - New glossary, troubleshooting, production, Grafana & alerting, and example-config pages; README now links into the docs site per section.
🧰 Build, Tooling & Code Quality
- Removed dead config keys (
proxy.http.maxThreads/minThreads,proxy.internal.scrapeRequestCheckMillis) and annotated unimplemented knobs. BuildConfigtimestamps now read fresh each build viaValueSource.- Moved detekt config to
config/detekt/and wireddetektintomake lint; tests run in CI with Kover coverage uploaded to Codecov. - Extensive no-behavior-change refactors across the agent, proxy, and common modules, plus broad new test coverage.
📦 Artifacts
Docker:
docker pull pambrose/prometheus-proxy:3.2.0
docker pull pambrose/prometheus-agent:3.2.0Maven Central:
implementation("com.pambrose:prometheus-proxy:3.2.0")See CHANGELOG.md for the complete, itemized list of changes.
Full Changelog: 3.1.1...3.2.0
v3.1.1
Maintenance release focused on public-API documentation, reproducible builds, flaky test fixes, and dependency updates.
Highlights
- Documented public API — Full KDoc on every
@Parameterfield ofBaseOptions/AgentOptions/ProxyOptions, everyEnvVarsvalue, theAgentandProxycompanion entry points, andEmbeddedAgentInfo, covering resolution precedence (CLI → env → config → default), sentinel values, and validation rules. - Reproducible builds —
BuildConfig.APP_RELEASE_DATEandBuildConfig.BUILD_TIMEaccept-PoverrideReleaseDate/-PoverrideBuildTimeGradle properties so CI can produce bit-identical artifacts. - Flaky test fixes — Replaced timing-based probes in
AgentTestandAgentHttpServiceTestwith deterministic readiness gates, eliminating two long-standing CI flakes. - Cleaner build script — Centralized repositories in
settings.gradle.kts, dropped the redundant fat-jar rewrap, removed the redundantjavaplugin alias, and aligneddependsOncalls ontasks.named().
Bug Fixes
- Fix flaky
AgentTest"Bug #1" coroutine backpressure test — sample point could land between batches and observe 0 active coroutines. Replaced with a deterministicCompletableDeferredgate plus Kotesteventually()for scheduler jitter. - Fix flaky
AgentHttpServiceTest— fixed 100 ms post-server.startdelay was insufficient on busy machines. Replaced with an active TCP-connect probe (20 ms poll, 5 s deadline).
Build & Tooling
- Centralize repository declarations in
settings.gradle.ktsviadependencyResolutionManagement(FAIL_ON_PROJECT_REPOS);mavenLocal()is opt-in with-PuseMavenLocal=true. - Replace the
agentJar/proxyJarzipTree-rewrap with twoShadowJartasks (configuration-cache safe; one fewer redundant fat jar on disk). - Drop the redundant
javaplugin (applied transitively bykotlin.jvm). - Switch
compileKotlin.dependsOn(":generateProto")totasks.named("generateProto")for type-safe task references. - Mark the internal
Utilsobject asinternal. - Hoist
formatter,releaseDate, andbuildTimeout of thebuildConfig {}block to top-levelvals. - Centralize test server readiness in a shared
startServerAndGetPorthelper. - Add
check-gpg-envMakefile target for GPG signing validation. - Fix the date format passed by the
buildandlocal-buildMakefile targets to match theMM/dd/yyyypattern parsed bybuild.gradle.kts. - Add Claude Code GitHub workflow.
Dependency Updates
| Dependency | Old | New |
|---|---|---|
| Kotlin | 2.3.20 | 2.3.21 |
| Gradle wrapper | 9.4.1 | 9.5.0 |
| Ktor | 3.4.2 | 3.4.3 |
| serialization | 1.10.0 | 1.11.0 |
| tcnative | 2.0.74.Final | 2.0.77.Final |
| utils | 2.7.1 | 2.8.1 |
| gradle-plugins | 1.0.12 | 1.0.14 |
| protobuf | 0.9.6 | 0.10.0 |
| taskinfo | 3.0.1 | 3.0.2 |
Full Changelog: 3.1.0...3.1.1
v3.1.0
Breaking Changes
- Maven coordinates changed: Published to Maven Central as
com.pambrose:prometheus-proxy - JitPack is no longer used; all dependencies resolve from Maven Central
New Features
- Add Zensical documentation site with comprehensive guides, code examples, and architecture diagrams
- Publish documentation to GitHub Pages via CI
Build & Tooling
- Migrate publishing from JitPack to Maven Central using vanniktech maven-publish plugin
- Replace manual
maven-publish+ sources/javadoc JAR tasks withmavenPublishingDSL - Remove JitPack plugin resolution strategy from
settings.gradle.kts - Remove
jitpack.yml - Add GPG signing for Maven Central (skipped when no key is provided)
- Add
google()repository to build script - Add
overrideVersionproperty support for snapshot publishing - Import
VisibilityModifierdirectly instead of using fully qualified name in Dokka config
Documentation
- Add full documentation site in
website/prometheus-proxy/with 13 pages covering architecture, getting started, configuration, security/TLS, Docker, embedded agent, service discovery, monitoring, CLI reference, and advanced topics - Add code example snippets imported via pymdownx.snippets
- Extract Java/Kotlin code examples into compilable source files so API changes are caught by the compiler
- Add mkdocs-material dependency and grid card layouts for Next Steps sections
- Add markdown extensions: admonition, details, attr_list, md_in_html, pymdownx.emoji with material icon support
- Add KDocs nav entry with API Reference section
- Update README.md with Maven Central badge, documentation site link, and dependency coordinates
Dependencies
- Bump utils to 2.7.1
- Bump Kotest to 6.1.10, Ktor to 3.4.2, Logback to 1.5.32
- Bump gradle-plugins to 1.0.12, Protoc to 4.34.1, Dropwizard to 4.2.38
- Bump Dokka to 2.2.0, maven-publish plugin to 0.36.0, Kover to 0.9.8
Metrics & Observability
- Add new proxy metrics:
proxy_chunk_validation_failures_total,proxy_chunked_transfers_abandoned_total,proxy_agent_displacement_total,proxy_scrape_response_bytes - Convert proxy and agent latency metrics from summaries to histograms
- Add new agent metrics:
agent_client_cache_size,agent_scrape_backlog_size - Add
pathandencodinglabels to proxy response metrics - Rebuild Grafana dashboards for new metric schema
Bug Fixes
- Fix flaky
HttpClientCacheTestby ensuring deterministic LRU eviction order - Fix scrape response bytes metric to observe correct unzipped size
Misc
- Use portable bash shebang (
#!/usr/bin/env bash) inbin/scripts - Extract Docker image version from
build.gradle.ktsinbin/scripts - Remove
.supersetconfig files - Remove legacy files and clean up
.gitignore
Full Changelog: 3.0.3...3.1.0
v3.0.3
v3.0.3
Dependency Updates
| Dependency | Old | New |
|---|---|---|
| Kotlin | 2.3.10 | 2.3.20 |
| Gradle wrapper | 9.2.0 | 9.4.0 |
| gRPC | 1.79.0 | 1.80.0 |
| Kotest | 6.1.3 | 6.1.7 |
| Protoc | 4.33.5 | 4.34.0 |
| utils | 2.5.3 | 2.6.3 |
| gradle-plugins | 1.0.8 | 1.0.10 |
| config plugin | 6.0.7 | 6.0.9 |
Build & Tooling
- Extract JitPack URLs into reusable Makefile variables (
JITPACK_BUILD_URL,JITPACK_API_URL) - Enable Gradle configuration caching and daemon for faster builds
- Add homepage link to plugins configuration in
build.gradle.kts - Update
.gitignoreto include test configuration files - Use
forEachinstead ofmapin coroutine launches for clarity inAgentConnectionContextTest
Documentation & Cleanup
- Add GitHub workflow commands and API documentation section to README
- Remove outdated GEMINI.md, AGENTS.md, and OpenSpec instructions
- Remove legacy documentation and workflows
- Clean up CLAUDE.md
See Release Notes for full details.
v3.0.0
Prometheus Proxy 3.0.0 (AKA Claude Code massive cleanup)
Bug Fixes
Data Integrity & Correctness
- Fix integer overflow in
ChunkedContext.totalByteCount(Int → Long) that could silently bypass size limits on large
payloads - Fix chunk checksum calculation to use actual byte count instead of full buffer size
- Fix
toScrapeResponseHeaderto propagate the actualsrZippedvalue (was hardcoded totrue) - Fix
applySummaryto propagate theheaderZippedvalue from chunked response headers - Fix
IOExceptionerror code fromNotFound(404) toServiceUnavailable(503) — semantically correct for
unreachable targets - Fix catch-all HTTP exception handler from
NotFound(404) toInternalServerError(500) - Fix
errorCode()to walk the exception cause chain for wrapped timeout exceptions - Fix OpenMetrics
# EOFmarker handling in consolidated responses — intermediate# EOFmarkers are now stripped - Fix
parseHostPortto strip brackets from IPv6 addresses inHostPort—[::1]:50051now yields host::1instead
of[::1]
Concurrency & Resource Management
- Fix TOCTOU race in
AgentContextCleanupService— agents are now re-checked for staleness before eviction - Fix negative
scrapeRequestBacklogSizewith atomic CAS-loop decrement clamped at zero - Fix
ConcurrentModificationExceptioninProxyPathManager.removePathsForAgentIdandrecentReqsaccess - Fix
HttpClientCache.close()deadlock — coroutine scope cancelled before acquiring mutex - Fix HTTP client close calls moved outside mutex to avoid blocking cache operations during slow I/O
- Fix idle HTTP clients now closed on eviction (previously only marked for close)
- Fix
AgentHttpServicenow properly closed during agent shutdown (resource leak) - Fix path registration concurrency by moving gRPC calls outside the mutex
- Fix
AgentClientInterceptorto use thenextchannel parameter instead of bypassing the interceptor chain - Fix synchronized
agentIdassignment inAgentClientInterceptorto prevent race condition - Fix
ScrapeRequestWrapper.markComplete()is now idempotent viaAtomicBoolean.compareAndSet - Fix
runCatchingreplaced withrunCatchingCancellablethroughout to avoid swallowingCancellationException - Fix agent context added after ID validation to prevent orphaned contexts
Error Handling & Cleanup
- Fix orphaned
ChunkedContextcleanup on stream failure — associated scrape requests are now explicitly failed - Fix chunk validation errors now throw
ChunkValidationExceptioninstead of crashing the gRPC stream - Fix
readRequestsFromProxythrowsStatusException(NOT_FOUND)when agent context is missing (was silently no-op) - Fix
connectAgent/connectAgentWithTransportFilterDisabledthrowStatusException(FAILED_PRECONDITION)instead of
RequestFailureException - Fix
sendHeartBeatre-throwsNOT_FOUNDstatus to trigger agent reconnection (was zombie state) - Fix agent invalidation now drains pending scrape requests and unblocks HTTP handlers immediately
- Fix
handleConnectionFailurere-throws JVMErrorsubclasses instead of retrying in a corrupted state - Fix stream cleanup for
transportFilterDisabledmode inreadRequestsFromProxyfinally block
Security
- Fix credential leak in
HttpClientCachelogs —ClientKey.toString()now masks credentials - Fix password
CharArrayzeroed after use inSslSettings.getKeyStore - Fix
FileInputStreamresource leak inSslSettings— now uses try-with-resources - Fix URL sanitization in agent logs to strip credentials before logging
Misc
- Fix gzip compression for small responses — enforced
minimumSize(1024)inProxyHttpConfig - Fix redundant
response.status()call inProxyUtils.respondWith - Fix service discovery and metrics paths now ensure leading
/ - Fix dynamic parameter handling to correctly set system properties
- Fix
registerPath/registerAgent/sendHeartBeatresponses only setreasonfield whenvalidis false - Fix typo: "Overide" → "Override" in config and ConfigVals
New Features
- Content size limits — New configurable limits to prevent zip bombs and unbounded memory:
proxy.internal.maxZippedContentSizeMBytes(default 5 MB)proxy.internal.maxUnzippedContentSizeMBytes(default 10 MB)agent.http.maxContentLengthMBytes/AGENT_MAX_CONTENT_LENGTH_MBYTES(default 10 MB)
- Unary RPC deadline —
agent.grpc.unaryDeadlineSecs/UNARY_DEADLINE_SECS(default 30s) prevents unary gRPC
calls from hanging indefinitely - Graceful scrape request failure — Orphaned scrape requests are failed with proper status on agent disconnect,
stream termination, chunk validation failure, and proxy shutdown - Consolidated/non-consolidated mismatch rejection —
addPathnow rejects mismatched agent types on the same path
with a descriptive error - Authorization header TLS warning — One-time warning logged when auth headers are sent over non-TLS connections
- HTTP request lifecycle —
cancelCallOnClose = truecancels HTTP requests when clients disconnect - Bounded scrape request channel — Agent-side channel now has configurable backpressure instead of unlimited
capacity - Outer scrape timeout —
withTimeoutwrapper infetchContent()as safety net beyond Ktor client timeout - Strict env var parsing — Boolean env vars only accept
"true"/"false"; integer/long env vars throw descriptive
errors on invalid values - "all" log level —
setLogLevelnow accepts "all" as a valid level - Input validation —
parseHostPortvalidates blank strings;parsePortvalidates port ranges - TLS config validation — Requires both certificate and key for TLS; warns on disabled X.509 verification
Refactoring
ScrapeResultsfields changed fromvartoval(fully immutable construction)ResponseResultsandScrapeRequestResponseconverted to immutable data classesupdateMsg: String→updateMsgs: List<String>inResponseResultsProxyUtilsresponse functions now return values instead of mutating a passed-in objectAgentContextManagermaps made private with accessor methods and read-only viewsScrapeRequestManager.scrapeRequestMapmade private with read-only viewProxyPathManagerchanged fromConcurrentMaptoHashMapwith explicitsynchronizedblocksAgentPathManagerusesConcurrentHashMapandMutexfor thread-safe registrationAgentGrpcServiceusesReentrantLockfor thread-safe shutdown and stub creation- gRPC metadata constants consolidated into
GrpcConstants - Config file moved:
etc/config/config.conf→config/config.conf - Detekt config moved:
config/detekt/→etc/detekt/ SslSettingsreturn types changed from nullable to non-nullable- Scrape request queue changed from
ChanneltoConcurrentLinkedQueuewith notifier - Scrape request polling loop replaced with event-driven
awaitCompleted()suspension - Proto: reserved field 5 in
RegisterAgentRequest; addedheader_zippedfield 8 toHeaderData
Dependency Updates
| Dependency | Old | New |
|---|---|---|
| Kotlin | 2.2.20 | 2.3.10 |
| Gradle wrapper | 8.x | 9.2.0 |
| Ktor | 3.2.3 | 3.4.0 |
| gRPC | 1.75.0 | 1.79.0 |
| Protoc | 4.32.0 | 4.33.5 |
| JCommander | 2.0 | 3.0 |
| Kotest | 6.0.3 | 6.1.3 |
| Logback | 1.5.18 | 1.5.31 |
| MockK | (new) | 1.14.9 |
| tcnative | 2.0.73 | 2.0.74 |
| utils | 2.4.5 | 2.5.3 |
| config plugin | 5.6.8 | 6.0.7 |
| kotlinter | 5.2.0 | 5.4.2 |
| kover | 0.9.1 | 0.9.7 |
| dropwizard | 4.2.36 | 4.2.38 |
| gengrpc | 1.4.3 | 1.5.0 |
| serialization | 1.9.0 | 1.10.0 |
| slf4j | 2.0.13 | 2.0.17 |
| typesafe | 1.4.4 | 1.4.5 |
CI/CD
- Added GitHub Actions CI workflow for building the project on push/PR to
master - Added GitHub Actions workflow for deploying Dokka API documentation to GitHub Pages
- Removed Travis CI configuration (
.travis.yml)
Documentation
- Integrated Dokka for HTML API documentation generation (
./gradlew dokkaHtml) - Added KDoc documentation across agent, proxy, and common packages
- Added module and package documentation (
docs/packages.md) - Added improvements roadmap document (
docs/improvements.md)
Testing
- ~26,000+ lines of new unit tests added
- Tests reorganized into
io.prometheus.agent/,io.prometheus.proxy/,io.prometheus.common/,io.prometheus.misc/ - Added MockK for mocking support
- Compiler option
-Xreturn-value-checker=checkenabled
Breaking Changes
High Impact — Will affect most users monitoring scrape responses
| # | Change | Detail |
|---|---|---|
| 1 | Default scrape failure status: 404 → 503 | ScrapeResults.srStatusCode default changed from NotFound (404) to ServiceUnavailable (503). Any monitoring/alerting keyed on status codes from failed scrapes will see different codes. |
| 2 | IOException scrape error: 404 → 503 | When the agent can't reach the scrape target (connection refused, DNS failure, etc.), the status returned to Prometheus changed from 404 to... |
v2.4.0
- Refactor dependency management in build.gradle.kts and libs.versions.toml
- Fix dependency declaration for Kotlin BOM in build.gradle.kts
- Update dependencies in libs.versions.toml, including gRPC, Jetty, and Kotest, and add protobuf-kotlin entry
- Update dependencies: Dropwizard to 4.2.36, Kotest to 6.0.2, and tcnative to 2.0.73 in libs.versions.toml
- Update plugin and dependency versions in libs.versions.toml
- Update Kotlin to 2.2.20
v2.3.0
- Add support for tuning concurrent endpoint HTTP clients
- Add support for tuning caching endpoint HTTP clients
- Rename concurrent scrapes configuration to maxConcurrentScrapes and update related documentation
- Expose httpClientCache in AgentHttpService and add a healthcheck for cache size
- Introduced
clientTimeoutSecsto configure HTTP client timeout in seconds, replacingagent.configVals.agent.internal.cioTimeoutSecs. - Update Ktor version to 3.2.2
v2.2.0
v2.1.0
v2.0.0
- Add gRPC keepalive support
- Update Ktor jar to 3.1.0