Feat/auto instrumentation service name#3
Conversation
Problem: When Datadog operator and Last9 OTEL operator run together, DD operator injects DD_SERVICE env var which some SDKs use as fallback, causing service.name to be overridden. Solution: 1. SDK level: Set OTEL_SERVICE_NAME from pod label app.kubernetes.io/name via fieldRef in Instrumentation CRD defaults section 2. Collector level: Add transform/traces/last9 processor with priority chain: - last9.io/service annotation (unconditional override) - last9.io/service-name annotation (legacy, unconditional) - resource.opentelemetry.io/service.name (OTel standard) - app.kubernetes.io/name label - k8s.deployment.name / statefulset / daemonset - app label - k8s.container.name (fallback) Priority order follows Beyla/OTel conventions for ecosystem interoperability. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…name resolution Auto-Instrumentation: - Add namespace-level auto-instrumentation with whitelist/blacklist support - auto-instrument=all: instruments all namespaces except system ones - auto-instrument=ns1,ns2: whitelist specific namespaces - auto-instrument-exclude=ns1,ns2: blacklist with auto-instrument=all - System namespaces always excluded: kube-system, kube-public, last9, etc. Languages enabled by default: - Java, Python, Node.js, .NET, PHP - Go, Apache HTTPD, Nginx available but commented (optional) - Note: Rust requires manual SDK instrumentation (no auto-instrumentation support) Service name resolution (collector-side): - Priority 1: last9.io/service annotation (unconditional override) - Priority 2: last9.io/service-name annotation (legacy) - Priority 3: resource.opentelemetry.io/service.name (OTel standard) - Priority 4: app.kubernetes.io/name label - Priority 5: k8s workload name (deployment/statefulset/daemonset) - Priority 6: app label - Priority 7: container name (fallback) Instrumentation CRD: - Set OTEL_SERVICE_NAME from pod label via fieldRef - Added .NET and PHP configurations - Added commented Go, Apache HTTPD, Nginx configurations Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Auto-detect deployment.environment from Kubernetes metadata with priority: 1. last9.io/env annotation (explicit override) 2. environment pod label 3. app.kubernetes.io/environment label (K8s standard) 4. namespace name (fallback) Also aligns service.name auto-detection for logs with the traces transform. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove OTEL_RESOURCE_ATTRIBUTES with deployment.environment=local from all language configs. The collector transforms now handle auto-detection from K8s metadata (annotations, labels, namespace). This ensures deployment.environment is dynamically set based on actual K8s context rather than a static "local" value. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add badges, "Why This Exists" section - Document all 5 enabled languages (.NET, PHP now included) - Add namespace-level auto-instrumentation docs - Document service.name and deployment.environment auto-detection - Add troubleshooting section - Add architecture diagram - Remove outdated Ruby/Go references - Update one-liner URL to new repo name Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🔍 Test Results - PR #3 Deployment TestingTested the PR changes on a local Minikube cluster with real workloads. Found several critical issues that prevent the PR's features from being deployed. Test Environment
🔴 Critical Issues (Blockers)Issue #1: Setup script downloads from WRONG repository
|
Fixes Applied for All Review CommentsThanks for the thorough testing @ashishl9! All 6 issues have been addressed. Here's a summary: Critical Issues (Blockers) — FIXEDIssue #1: Setup script downloads from WRONG repository
Issue #2: Instrumentation CRD has unsupported fields
Issue #4: OTEL_SERVICE_NAME fieldRef will fail without label
Medium Priority — FIXEDIssue #5: Duplicate service name resolution logic
Issue #6: Environment fallback too permissive
Additional ImprovementsAligned logs and traces transform processors:
Added CI validation pipeline (
Local validation results:
Files Changed
cc @ashishl9 |
- Fix DEFAULT_REPO URL from l9-otel-operator to last9-k8s-observability - Remove invalid CRD fields: spec.defaults.env (not in schema), spec.php (unsupported by operator) - Use spec.defaults.useLabelsForResourceAttributes instead of fieldRef for service.name detection (gracefully handles missing labels) - Add IsMatch() validation for namespace-as-environment fallback to prevent non-environment namespaces from becoming deployment.environment - Align logs and traces transform processors to use same priority chain - Update to latest versions: operator chart 0.105.1, collector 0.145.0 - Add CI validation pipeline (shellcheck, yamllint, kubeconform, helm template, otelcol-contrib validate) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add missing newline at end of instrumentation.yaml - Remove trailing whitespace in collector values - Raise ShellCheck severity to error (pre-existing SC2155 warnings) - Relax yamllint rules for indentation/empty-lines (pre-existing) and increase line-length to 300 for OTTL statements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Override the OTel Operator's default --require injection with --import to support both ESM and CJS Node.js apps. The .mjs entry point registers the import-in-the-middle hook for ESM and falls back to --require behavior for CJS. Without this, ESM apps get no framework-level instrumentation (missing http.route, incorrect span names like bare GET/POST). Requires autoinstrumentation-nodejs image >= 0.57.0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Critical fixes: - Replace hardcoded cluster.name="minikube" with "unknown-cluster" default The setup script already detects this from kubectl context, but the YAML had a hardcoded dev value that would leak into production High priority fixes: - Remove PHP from script and README (OTel Operator CRD doesn't support PHP) Was injecting inject-php annotations that do nothing, causing confusion - Add validation error when auto-instrument-exclude is used with whitelist mode Previously silently ignored, now fails fast with clear error message Low priority fixes: - Delete unused uninstrument_namespace() function (dead code) - Relax IsMatch regex for deployment.environment to match staging-v2, prod-us-east Changed from exact match ^(prod|...)$ to prefix match ^(prod|...)([-_].*)?$ Documentation: - Document that last9.io/service annotation unconditionally overrides SDK-set service.name, preventing confusion when app config is ignored Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ashishl9
left a comment
There was a problem hiding this comment.
All 7 issues from the initial review have been addressed in the Feb 17 commit. CI pipeline added, cluster.name hardcode fixed, PHP inconsistency resolved, and namespace exclude validation added. LGTM.
Summary
This PR adds robust auto-instrumentation support with automatic
service.nameanddeployment.environmentdetection, namespace-level instrumentation management, prevents conflicts with other operators (like Datadog), enables ESM support for Node.js apps, and fixes 7 configuration/documentation issues.Changes
1. Service Name Resolution
Automatic
service.namedetection from Kubernetes metadata:last9.io/serviceannotation (explicit override)app.kubernetes.io/namelabel (K8s standard)app.kubernetes.io/componentlabelapplabel2. Deployment Environment Auto-Detection
Automatic
deployment.environmentdetection (replaces hardcodedlocal):last9.io/envannotation (explicit override)environmentlabelapp.kubernetes.io/environmentlabel (K8s standard)staging-v2,prod-us-east)3. Namespace-Level Auto-Instrumentation
auto-instrument=all— instrument all namespaces (excluding system)auto-instrument=ns1,ns2— whitelist specific namespacesauto-instrument-exclude=ns1,ns2— blacklist specific namespaces (only works withauto-instrument=all)System namespaces auto-excluded:
kube-system,kube-public,kube-node-lease,cert-manager,istio-system,linkerd,monitoring,prometheus,grafana,argocd,flux-system,last94. Multi-Language Support
5. Node.js ESM Auto-Instrumentation
Overrides the OTel Operator's default
--requireinjection with--importto support both ESM and CJS Node.js apps. Without this, ESM apps get no framework-level instrumentation (missinghttp.route, incorrect span names like bareGET/POST).--import /otel-auto-instrumentation-nodejs/autoinstrumentation.mjs.mjsentry point registersimport-in-the-middlehooks for ESM--requirebehavior for CJS apps (backward compatible)autoinstrumentation-nodejsimage >= 0.57.06. Bug Fixes (7 issues resolved)
Critical
cluster.name: "minikube"in collector values"unknown-cluster"default that matches script's auto-detectionkubectl contextorcluster=parameterHigh Priority
Removed PHP false advertising
inject-phpannotation injection from setup scriptPHP | Not Supported | CRD doesn't support PHP, use SDKFixed
auto-instrument-excludesilently ignored in whitelist modeLow Priority
Deleted unused
uninstrument_namespace()functionRelaxed
IsMatchregex fordeployment.environmentdetection^(prod|staging|...)$to prefix match^(prod|staging|...)([-_].*)?$staging-v2,prod-us-east,dev-team-amyproduction,unstagingDocumentation
last9.io/serviceannotation unconditionally overrides SDK-setservice.nameOTEL_SERVICE_NAMEis ignored7. Documentation Overhaul
Files Changed
instrumentation.yamlOTEL_SERVICE_NAMEvia fieldRef, enabled .NET, removed hardcodeddeployment.environment=local, addedNODE_OPTIONSfor ESM supportlast9-otel-collector-values.yamlservice.nameanddeployment.environmentauto-detection, fixed hardcodedcluster.name="minikube", relaxed namespace regexlast9-otel-setup.shauto-instrument,auto-instrument-excludeoptions with validation, removed PHP injection, deleted dead codeREADME.mdUsage
Override Examples
Test Plan
auto-instrument=allservice.nameis resolved from K8s labelsdeployment.environmentis set from namespace (not hardcodedlocal)last9.io/serviceandlast9.io/envannotationshttp.route(Express, Fastify, Koa, Hapi)--importflagcluster.nameis auto-detected (not hardcoded to "minikube")staging-v2,prod-us-east-1are detected correctlyauto-instrument-excludewith whitelist mode returns errorGenerated with Claude Code