[Infrastructure] Add OpenTelemetry tracing support for safety pipeline#20
[Infrastructure] Add OpenTelemetry tracing support for safety pipeline#20ariktadas144 wants to merge 15 commits into
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds optional OpenTelemetry tracing: setup/shutdown, YAML/env config, optional dependency, instrumentation in HumaneProxy and SafetyPipeline (stage and finalize spans), FastAPI middleware/endpoints with per-request spans and auth, MCP/CLI telemetry initialization, storage default path rename, README updates, and comprehensive telemetry tests. ChangesOpenTelemetry Tracing Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
humane_proxy/__init__.py (1)
102-149:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winGuard span attribute writes when tracer is unavailable.
On lines 127-130 and 144-147,
spanwill beNonewhen OpenTelemetry is not installed, causingspan.set_attribute(...)to raiseAttributeErrorand breakcheck()andcheck_async()in the default configuration.Proposed fix
def check(self, text: str, session_id: str = "programmatic") -> dict: """Run the synchronous safety pipeline on *text* (Stages 1+2). Returns ------- dict ``{"safe": bool, "category": str, "score": float, "triggers": list, "stage_reached": int, ...}`` """ with self._span("humane_proxy.proxy.check") as span: + if span is not None: - span.set_attribute( - "humane_proxy.session_id", - hashlib.sha256(session_id.encode("utf-8")).hexdigest(), - ) + span.set_attribute( + "humane_proxy.session_id", + hashlib.sha256(session_id.encode("utf-8")).hexdigest(), + ) result = self._pipeline.classify_sync(text, session_id) async def check_async(self, text: str, session_id: str = "programmatic") -> dict: """Run the full async safety pipeline on *text* (all 3 stages). Returns ------- dict Same as :meth:`check`, but potentially enriched with Stage-3 reasoning and higher accuracy. """ with self._span("humane_proxy.proxy.check_async") as span: + if span is not None: - span.set_attribute( - "humane_proxy.session_id", - hashlib.sha256(session_id.encode("utf-8")).hexdigest(), - ) + span.set_attribute( + "humane_proxy.session_id", + hashlib.sha256(session_id.encode("utf-8")).hexdigest(), + ) result = await self._pipeline.classify(text, session_id)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@humane_proxy/__init__.py` around lines 102 - 149, The span context manager _span(...) may yield None when no tracer is installed, so calling span.set_attribute(...) in check and check_async will raise AttributeError; fix by guarding those calls (e.g., after "with self._span(... ) as span:" check "if span is not None:" before calling span.set_attribute(...)) or call via a safe helper (e.g., getattr(span, 'set_attribute', lambda *a, **k: None)(...)); update both check (uses self._pipeline.classify_sync) and check_async (uses self._pipeline.classify) to use the guard.
🧹 Nitpick comments (4)
tests/test_telemetry.py (1)
117-125: ⚡ Quick winAvoid depending on private OpenTelemetry globals for provider reset.
Using private internals (
trace._TRACER_PROVIDER_SET_ONCE,trace._TRACER_PROVIDER) makes these tests version-fragile. Prefer a pytest fixture that monkeypatches tracer state via supported test utilities (or guards this fallback behind attribute checks) to reduce breakage on OpenTelemetry upgrades.Suggested direction
def setup_inmemory_tracing(): @@ - # The OpenTelemetry API only allows setting the tracer provider once. - # Reset the internal initialization guard in tests so the in-memory - # provider can be installed cleanly. - import opentelemetry.util._once as ot_once # type: ignore - - trace._TRACER_PROVIDER_SET_ONCE = ot_once.Once() - trace._TRACER_PROVIDER = None - - trace.set_tracer_provider(provider) + # Prefer supported reset strategy for test isolation. + # If private fallback is required, guard it to avoid hard failures + # when OpenTelemetry internals change across versions. + try: + trace.set_tracer_provider(provider) + except Exception: + # guarded fallback for legacy behavior + import opentelemetry.util._once as ot_once # type: ignore + if hasattr(trace, "_TRACER_PROVIDER_SET_ONCE"): + trace._TRACER_PROVIDER_SET_ONCE = ot_once.Once() + if hasattr(trace, "_TRACER_PROVIDER"): + trace._TRACER_PROVIDER = None + trace.set_tracer_provider(provider)Also applies to: 512-515
humane_proxy/middleware/interceptor.py (2)
192-203: ⚡ Quick winConsider using timing-safe comparison for API key validation.
The direct string comparison at line 203 is vulnerable to timing attacks. For security-sensitive API key validation, use
secrets.compare_digest()for constant-time comparison.🔒 Proposed fix
+import secrets + def _authorize(request: Request) -> bool: if not HUMANE_PROXY_API_KEY: return True auth = request.headers.get("Authorization", "") if not auth.startswith("Bearer "): return False token = auth.replace("Bearer ", "").strip() - return token == HUMANE_PROXY_API_KEY + return secrets.compare_digest(token, HUMANE_PROXY_API_KEY)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@humane_proxy/middleware/interceptor.py` around lines 192 - 203, The _authorize function currently uses direct equality (token == HUMANE_PROXY_API_KEY) which is vulnerable to timing attacks; change the comparison to a timing-safe one by importing the secrets module and using secrets.compare_digest(token, HUMANE_PROXY_API_KEY) instead of == in _authorize, ensuring you preserve the existing Bearer parsing logic and strip() behavior before comparing and handle the case where HUMANE_PROXY_API_KEY or token might be non-string by converting to str if needed.
220-232: ⚡ Quick winMove
hashlibimport to module level.The
import hashlibinside the request handler is executed on every authenticated request. Move it to module-level imports for better performance.♻️ Proposed fix
At module level (near line 5-8):
import uuid import time +import hashlibThen remove line 222 and update the hashing:
if span is not None: - import hashlib - safe_session = hashlib.sha256(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@humane_proxy/middleware/interceptor.py` around lines 220 - 232, The local import of hashlib inside the request handler should be moved to module-level imports to avoid repeated imports on every request; add "import hashlib" near the top of the module and remove the inline "import hashlib" in the block that computes safe_session (the code that calls hashlib.sha256(session_id.encode('utf-8')).hexdigest()); ensure the existing logic that gets the span via trace.get_current_span(), computes safe_session from session_id, and calls _set_attr(span, "humane_proxy.session_id", safe_session) remains unchanged except for using the module-level hashlib.humane_proxy/cli.py (1)
324-345: ⚖️ Poor tradeoffConsider using storage layer for consistency.
The
sessioncommand uses direct SQLite queries while the refactoredescalationscommand uses the storage layer abstraction. For consistency and maintainability, consider usingget_store()here as well.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@humane_proxy/cli.py` around lines 324 - 345, The session CLI handler currently opens sqlite3 directly (using _get_db_path and sqlite3.connect) instead of the storage abstraction; refactor the session function to call init_db() then obtain the storage via get_store() and query the escalations for the given session_id through the store API (e.g., store.get_escalations or store.query_escalations), extracting category, risk_score, timestamp and triggers from those returned records, and remove the direct sqlite3.connect/_get_db_path usage and manual conn.close handling; keep the function signature session(session_id: str) and preserve existing output formatting but source data from get_store() for consistency with the escalations command.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@humane_proxy/classifiers/pipeline.py`:
- Around line 287-325: The code is writing telemetry attributes to stage1_span
after its context has ended; change the _set_attr calls in the early-exit branch
(after calling self._finalize) to write to the still-active parent span
(variable named span) instead of stage1_span so attributes like
"humane_proxy.category", "humane_proxy.score", "humane_proxy.triggers_count",
"humane_proxy.stage_reached" and "humane_proxy.message_hash" (when
final.message_hash) are recorded; locate the early-exit block around the
stage1_span usage and replace the target span argument in each _set_attr
invocation from stage1_span to span.
In `@humane_proxy/middleware/interceptor.py`:
- Around line 107-117: Define and initialize _REQUEST_COUNT (e.g.,
_REQUEST_COUNT = 0) before the middleware function, remove the duplicate later,
and guard tracer usage from _get_tracer(): call tracer = _get_tracer() and if
tracer is not None use tracer.start_as_current_span("humane_proxy.http.request")
as the context manager, otherwise use a no-op context manager (e.g.,
contextlib.nullcontext()) so span_ctx is always a valid context to enter; update
references to span_ctx/with span_ctx as span accordingly.
---
Outside diff comments:
In `@humane_proxy/__init__.py`:
- Around line 102-149: The span context manager _span(...) may yield None when
no tracer is installed, so calling span.set_attribute(...) in check and
check_async will raise AttributeError; fix by guarding those calls (e.g., after
"with self._span(... ) as span:" check "if span is not None:" before calling
span.set_attribute(...)) or call via a safe helper (e.g., getattr(span,
'set_attribute', lambda *a, **k: None)(...)); update both check (uses
self._pipeline.classify_sync) and check_async (uses self._pipeline.classify) to
use the guard.
---
Nitpick comments:
In `@humane_proxy/cli.py`:
- Around line 324-345: The session CLI handler currently opens sqlite3 directly
(using _get_db_path and sqlite3.connect) instead of the storage abstraction;
refactor the session function to call init_db() then obtain the storage via
get_store() and query the escalations for the given session_id through the store
API (e.g., store.get_escalations or store.query_escalations), extracting
category, risk_score, timestamp and triggers from those returned records, and
remove the direct sqlite3.connect/_get_db_path usage and manual conn.close
handling; keep the function signature session(session_id: str) and preserve
existing output formatting but source data from get_store() for consistency with
the escalations command.
In `@humane_proxy/middleware/interceptor.py`:
- Around line 192-203: The _authorize function currently uses direct equality
(token == HUMANE_PROXY_API_KEY) which is vulnerable to timing attacks; change
the comparison to a timing-safe one by importing the secrets module and using
secrets.compare_digest(token, HUMANE_PROXY_API_KEY) instead of == in _authorize,
ensuring you preserve the existing Bearer parsing logic and strip() behavior
before comparing and handle the case where HUMANE_PROXY_API_KEY or token might
be non-string by converting to str if needed.
- Around line 220-232: The local import of hashlib inside the request handler
should be moved to module-level imports to avoid repeated imports on every
request; add "import hashlib" near the top of the module and remove the inline
"import hashlib" in the block that computes safe_session (the code that calls
hashlib.sha256(session_id.encode('utf-8')).hexdigest()); ensure the existing
logic that gets the span via trace.get_current_span(), computes safe_session
from session_id, and calls _set_attr(span, "humane_proxy.session_id",
safe_session) remains unchanged except for using the module-level hashlib.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f3c36f44-ada9-42ed-b0a0-a335767e272b
📒 Files selected for processing (13)
README.mdhumane_proxy.yamlhumane_proxy/__init__.pyhumane_proxy/classifiers/pipeline.pyhumane_proxy/cli.pyhumane_proxy/config.pyhumane_proxy/config.yamlhumane_proxy/mcp_server.pyhumane_proxy/middleware/interceptor.pyhumane_proxy/storage/sqlite.pyhumane_proxy/telemetry.pypyproject.tomltests/test_telemetry.py
|
@ariktadas144 telemetry test failed, so please fix it |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@humane_proxy/__init__.py`:
- Around line 122-127: The span attribute setting currently computes
hashlib.sha256(session_id...) and calls span.set_attribute even when the span is
non-recording; wrap the hashing and set_attribute behind a check for
span.is_recording() inside the with self._span("humane_proxy.proxy.check") block
(and apply the same pattern for the later block at the other occurrence), so
only when span is not None and span.is_recording() do you compute the hash and
call span.set_attribute to avoid work on NoOp spans.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ece0c8ee-3a32-41b3-a185-b2ffc807a37a
📒 Files selected for processing (4)
humane_proxy/__init__.pyhumane_proxy/classifiers/pipeline.pyhumane_proxy/middleware/interceptor.pytests/test_telemetry.py
🚧 Files skipped from review as they are similar to previous changes (3)
- humane_proxy/middleware/interceptor.py
- humane_proxy/classifiers/pipeline.py
- tests/test_telemetry.py
|
@ariktadas144 If possible, I'd really like to know the whole thought process behind each commit in this PR. Please explain your reasoning for changing the files you did and how you used AI help for the changes. Please also explain why |
Refactor MCP authentication handling and remove unused constants.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
humane_proxy/cli.py (1)
330-345: 💤 Low valueConsider using the storage layer abstraction for consistency.
The
escalationscommand usesget_store()for abstracted storage access, whilesessionuses direct SQLite via_get_db_path(). If a different storage backend is configured, this command would silently bypass it and query SQLite directly.If the
sessioncommand requires queries not yet exposed by the storage interface, this is fine for now, but worth tracking for future alignment.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@humane_proxy/cli.py` around lines 330 - 345, The session command is bypassing the storage abstraction by calling init_db() and sqlite3.connect(_get_db_path()) directly; replace this direct SQLite access with the storage layer (use get_store()) so the escalations query goes through the configured backend. Locate the block around init_db(), _get_db_path(), and the direct conn/execute call and refactor to call the store API (get_store()) to fetch escalations (or add a new store method if needed) rather than opening sqlite3 connections directly. Ensure the new code uses the store's query method and removes the direct sqlite3.connect usage so backends other than SQLite are respected.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@humane_proxy/cli.py`:
- Around line 330-345: The session command is bypassing the storage abstraction
by calling init_db() and sqlite3.connect(_get_db_path()) directly; replace this
direct SQLite access with the storage layer (use get_store()) so the escalations
query goes through the configured backend. Locate the block around init_db(),
_get_db_path(), and the direct conn/execute call and refactor to call the store
API (get_store()) to fetch escalations (or add a new store method if needed)
rather than opening sqlite3 connections directly. Ensure the new code uses the
store's query method and removes the direct sqlite3.connect usage so backends
other than SQLite are respected.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f82e91b3-0e51-4daf-8941-8bee6e019dea
📒 Files selected for processing (4)
README.mdhumane_proxy/__init__.pyhumane_proxy/cli.pyhumane_proxy/mcp_server.py
🚧 Files skipped from review as they are similar to previous changes (3)
- README.md
- humane_proxy/mcp_server.py
- humane_proxy/init.py
|
@Vishisht16 Before opening the PR initially, I had run the local validation flow on my machine, including pytest, Ruff, Black, and the other configured checks, and they were passing locally at that stage. The issues started after the CI failures appeared on GitHub. The initial telemetry-related failure showed a Where I made the mistake was after resolving the follow-up conflicts and applying those fixes, I did not rerun the full validation flow carefully enough before pushing the updated commits. The duplicated After identifying those problems, I went back through the affected files carefully, cleaned up the merge artifacts manually, removed the duplicated sections, and reran the full local validation flow again including Ruff, Black, pytest, and syntax/import checks to ensure the final state was clean and passing before pushing the latest fixes. So the original PR state had been validated locally before submission, and after the later CI/debugging issues I worked through the remaining problems and revalidated the corrected state thoroughly before the final updates. |
Vishisht16
left a comment
There was a problem hiding this comment.
Includes bugs with a bad commit history of fixing one thing while breaking another, which is typical of code written by AI without reading. Commits are also spaced 3-4 minutes apart. Correct practices were not followed and the code is still breaking in some places.
|
@ariktadas144 |
Summary (Closes #7)
This PR adds optional OpenTelemetry tracing support to HumaneProxy's safety pipeline.
Features
Added tracing spans for:
proxy.check()proxy.check_async()Telemetry behavior
Span attributes
Only approved attributes are emitted:
session_id(hashed)scorefinal_scorecategorystage_reachedtriggers_countmessage_hashAdditional changes
HUMANE_PROXY_TELEMETRY_ENABLEDenv overrideTesting
Added deterministic CI-safe tests using
InMemorySpanExportercovering:Validation
pytest→ 230 passed