Per-Share Egress Monitoring#10
Conversation
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Pull request overview
Adds per-share egress monitoring to the Delta Sharing reference server by emitting structured access logs and classifying egress into GCP pricing tiers based on client location signals.
Changes:
- Introduces
accessLoggingserver configuration (YAML +AccessLoggingConfig) and wires structured access log emission into query + CDF endpoints. - Adds GCP egress pricing tier classification logic plus an IP-range lookup utility (backed by GCP
cloud.json). - Adds documentation and unit tests for pricing-tier resolution and logging components, plus manifest updates enabling the feature in several environments.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| server/src/universal/conf/delta-sharing-server.yaml.template | Adds perfLoggingEnabled and accessLogging template configuration block. |
| server/src/test/scala/io/delta/sharing/server/telemetry/GcpPricingTierSuite.scala | Unit tests for continent mapping, tier calculation, and egress type detection. |
| server/src/test/scala/io/delta/sharing/server/telemetry/GcpIpRangeLookupSuite.scala | Unit tests for IP→region lookup and trie behavior. |
| server/src/test/scala/io/delta/sharing/server/telemetry/AccessLogEmitterSuite.scala | Unit tests for access log entry models and emitter creation behavior. |
| server/src/test/scala/io/delta/sharing/server/DeltaSharingServiceSuite.scala | Adds tests for egress byte extraction and client/pricing context building. |
| server/src/test/scala/io/delta/sharing/server/config/ServerConfigSuite.scala | Adds tests for YAML parsing/roundtrip of accessLogging config. |
| server/src/main/scala/io/delta/sharing/server/telemetry/GcpPricingTier.scala | Implements tier calculation + egress type detection utilities. |
| server/src/main/scala/io/delta/sharing/server/telemetry/GcpIpRangeLookup.scala | Fetches/parses GCP IP ranges and performs CIDR trie lookups. |
| server/src/main/scala/io/delta/sharing/server/telemetry/AccessLogEmitter.scala | Defines access log models and JSON log emission implementation. |
| server/src/main/scala/io/delta/sharing/server/DeltaSharingService.scala | Emits per-request access logs for query/CDF and adds pricing-context helpers. |
| server/src/main/scala/io/delta/sharing/server/config/ServerConfig.scala | Adds accessLogging to ServerConfig and defines AccessLoggingConfig. |
| README.md | Documents optional share egress access logging and example YAML. |
| manifests/zing-preview/configmap.yaml | Enables access logging in zing-preview base config. |
| manifests/zcloud-prod3/kustomization.yaml | Adds strategic merge patch for prod3 configmap. |
| manifests/zcloud-prod3/configmap.yaml | Adds prod3 base configmap enabling access logging. |
| manifests/zcloud-prod2/configmap.yaml | Enables access logging in zcloud-prod2 base config. |
| manifests/zcloud-prod/configmap.yaml | Enables access logging in zcloud-prod base config. |
| manifests/zcloud-emea/kustomization.yaml | Adds strategic merge patch for emea configmap. |
| manifests/zcloud-emea/configmap.yaml | Adds emea base configmap enabling access logging. |
| manifests/base/deployment.yaml | Extends placeholder substitution and injects GCP_PROJECT_ID env var into init/proxy. |
| manifests/base/configmap.yaml | Enables access logging in base configmap. |
| docs/PER_SHARE_EGRESS_MONITORING.md | New detailed design/usage documentation for per-share egress monitoring. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,274 @@ | |||
| /* | |||
| * Copyright (2021) The Delta Lake Project Authors. | |||
| * @param clientRegion ISO 3166-1 alpha-2 country code of the client (e.g., "US", "MT") | ||
| * @param requestType Type of request: "query" (snapshot read) or "cdf_stream" (CDF streaming) | ||
| */ | ||
| case class AccessLogEntry( |
There was a problem hiding this comment.
Consider a logId field to help correlate associated logs such as pricing context and others. The same field & value would also be added to the associated logs.
This will help extract related logs and also be useful if we need to retain samples of the pricing context for large access corroboration and audit needs
timestampMs could potentially serve as a proxy for that, but may be ambiguous when there is concurrent activity.
Update: We can wait on this one, since we will be storing the records in DeltaLake.
No description provided.