Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 8 additions & 20 deletions .claude/docs/features/owie-chat.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,33 +135,21 @@ pnpm chatlog:id <sessionId> # stream one session

## Observability (Langfuse)

All observability uses Langfuse direct HTTP ingestion β€” no SDK (JS SDK silently fails in CF Workers, GitHub issue
#11984).
Uses OTEL approach: `instrumentation.ts` β†’ `NodeTracerProvider` + `LangfuseSpanProcessor` β†’ `observe()` in route handler β†’ `experimental_telemetry: { isEnabled: true }` on `streamText`.

**File:** `lib/langfuse.ts`
**Full details:** `.claude/docs/learnings/langfuse-otel-tracing.md`

- `fetchSystemPrompt()` β€” fetches `chat-system-prompt?label=production`, 60s in-memory cache, falls back to hardcoded
`SYSTEM_PROMPT` on error
- `sendChatTrace()` β€” sends trace with input/output/model/tokens/latency/finishReason/steps/promptVersion
**Files:**
- `instrumentation.ts` β€” registers OTEL provider at Next.js startup
- `lib/langfuse.ts` β€” `fetchSystemPrompt()` (prompt management) + `postFeedbackScore()` (user feedback scores)
- `app/api/chat/route.ts` β€” `observe()` wrapper, `propagateAttributes()`, `getActiveTraceId()`
- `app/api/chat/feedback/route.ts` β€” server proxy for posting feedback scores (keeps secret key server-side)

**Trace fields:**

| Field | Source |
|-----------------|--------------------------------------------|
| `input` | Last user message text |
| `output` | Full assistant response |
| `model` | `CHAT_MODEL` / `DEFAULT_MODEL` env var |
| `tokens.input` | `usage.inputTokens` from AI SDK `onFinish` |
| `tokens.output` | `usage.outputTokens` |
| `latencyMs` | `Date.now() - startTime` |
| `finishReason` | From `onFinish` |
| `steps` | `steps.length` (tool call count) |
| `promptVersion` | From Langfuse prompt fetch |
**User feedback:** πŸ‘πŸ‘Ž buttons in assistant action bar β†’ `FeedbackAdapter` in `chat-runtime.tsx` β†’ `/api/chat/feedback` β†’ `POST /api/public/scores` with `name: 'user-feedback'`, value `1` or `0`.

**LLM-as-Judge evaluator** configured in Langfuse UI β†’ Evaluators:

- Target: Live Traces, filter trace name = `chat`
- Model: setup in Langfuse UI
- Scores appear in Evaluation β†’ Scores automatically

**Env vars:**
Expand Down
3 changes: 2 additions & 1 deletion .claude/docs/learnings/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,5 @@ Updated as new things are learned. Review anytime to consolidate understanding.
| [static-export-constraints.md](static-export-constraints.md) | What `output: 'export'` means, what's not allowed, build-time data fetching pattern |
| [seo-jsonld-pattern.md](seo-jsonld-pattern.md) | JSON-LD structured data, OG metadata, per-page builder pattern |
| [local-first-wallet.md](local-first-wallet.md) | IndexedDB, Dexie.js, why there's no server/login, client component rule |
| [mdx-blog-pipeline.md](mdx-blog-pipeline.md) | MDX format, frontmatter fields, categories, image conventions, validation |
| [mdx-blog-pipeline.md](mdx-blog-pipeline.md) | MDX format, frontmatter fields, categories, image conventions, validation |
| [langfuse-otel-tracing.md](langfuse-otel-tracing.md) | OTEL vs manual HTTP, instrumentation.ts setup, observe() pattern, traceId→feedback flow, score API |
183 changes: 183 additions & 0 deletions .claude/docs/learnings/langfuse-otel-tracing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
# Langfuse + Vercel AI SDK: OTEL Tracing

## The two approaches

### Approach A: OTEL (canonical, current)
`instrumentation.ts` β†’ `NodeTracerProvider` + `LangfuseSpanProcessor` β†’ `observe()` wrapper β†’ `experimental_telemetry: { isEnabled: true }` on `streamText`.

### Approach B: Manual HTTP (old, removed)
Raw `fetch()` to `/api/public/ingestion`. Was in codebase from early features, added ad-hoc, no consistent pattern. Replaced by Approach A.

**Always use A.** B looks simple but misses: auto token counts, span hierarchy, tool call spans, step-level tracing β€” all of which OTEL gives for free via `experimental_telemetry`.

---

## Setup: instrumentation.ts

Must use `NodeTracerProvider` directly. Do NOT use `@vercel/otel` β€” Langfuse docs explicitly say to avoid it.

```ts
import { LangfuseSpanProcessor } from '@langfuse/otel';
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { resourceFromAttributes } from '@opentelemetry/resources'; // v2 API β€” NOT `new Resource({})`

export const langfuseSpanProcessor = new LangfuseSpanProcessor();

const tracerProvider = new NodeTracerProvider({
resource: resourceFromAttributes({
'service.name': process.env.VERCEL_ENV ?? 'development',
}),
spanProcessors: [langfuseSpanProcessor],
});

tracerProvider.register();
```

**`@opentelemetry/resources` v2 breaking change:** `new Resource({})` removed, replaced by `resourceFromAttributes({})`.

**`VERCEL_ENV`** = `production` / `preview` / `development` β€” use as `service.name` to filter traces by environment in Langfuse UI. No extra env var needed.

Export `langfuseSpanProcessor` from here so route can call `forceFlush()` via `after()`.

---

## Setup: route handler

```ts
import { observe, propagateAttributes, setActiveTraceIO, getActiveTraceId } from '@langfuse/tracing';
import { trace } from '@opentelemetry/api';
import { langfuseSpanProcessor } from '@/instrumentation';
import { after } from 'next/server';

const handler = async (req: Request) => {
// set input before propagateAttributes
setActiveTraceIO({ input: lastUserMessage });

return await propagateAttributes(
{
traceName: 'chat',
sessionId: body.sessionId,
userId: body.userId,
tags: ['web-chat'],
// all metadata values must be string (propagateAttributes type constraint)
metadata: { model, ip, promptVersion: String(promptVersion), messageCount: String(messageCount) },
},
async () => {
const traceId = getActiveTraceId(); // capture before streamText

const result = streamText({
model: openrouter(model),
messages,
experimental_telemetry: { isEnabled: true }, // auto-captures tokens, steps, tool calls
onFinish: async ({ text }) => {
setActiveTraceIO({ output: text });
trace.getActiveSpan()?.end(); // must end span manually (endOnExit: false)
},
onError: async () => {
trace.getActiveSpan()?.end();
},
});

result.consumeStream(); // no await β€” ensures onFinish fires if client disconnects

after(async () => await langfuseSpanProcessor.forceFlush()); // required for serverless

return result.toUIMessageStreamResponse({
messageMetadata: ({ part }) => {
if (part.type === 'finish') return { custom: { traceId } }; // expose traceId to frontend
return undefined;
},
});
}
);
};

export const POST = observe(handler, {
name: 'handle-chat-message',
endOnExit: false, // keep span open until stream finishes (onFinish calls span.end())
});
```

**`endOnExit: false`** is critical β€” without it the span closes before `onFinish` fires and output is never recorded.

**`after()` + `forceFlush()`** is required in serverless (Vercel functions). Without it, the function exits before spans are flushed to Langfuse.

**`metadata` values must all be strings** β€” `propagateAttributes` types `metadata` as `Record<string, string>`. Pass numbers with `String()`.

---

## traceId β†’ frontend β†’ feedback

`getActiveTraceId()` captures the active trace ID inside `propagateAttributes()`. Inject into `messageMetadata` on the `finish` part β†’ flows to `message.metadata.custom.traceId` on the client.

Read in `FeedbackAdapter`:

```ts
adapters: {
feedback: {
submit: async ({ type, message }) => {
const traceId = (message.metadata as { custom?: { traceId?: string } })?.custom?.traceId;
if (!traceId) return; // silently skip if no traceId (e.g. old messages)
await fetch('/api/chat/feedback', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ traceId, value: type === 'positive' ? 1 : 0 }),
});
},
},
},
```

**Never expose `LANGFUSE_SECRET_KEY` to frontend.** Use a server proxy route (`/api/chat/feedback`) that calls `POST /api/public/scores`.

---

## Posting scores (feedback)

```ts
// POST /api/public/scores β€” NOT batch ingestion
// Batch ingestion silently accepts scores but never stores them
await fetch(`${baseUrl}/api/public/scores`, {
method: 'POST',
headers: { Authorization: basicAuth(), 'Content-Type': 'application/json' },
body: JSON.stringify({
traceId,
name: 'user-feedback',
value, // 0 or 1
dataType: 'NUMERIC',
}),
});
```

---

## What gets auto-tracked by OTEL

When `experimental_telemetry: { isEnabled: true }` is set on `streamText`, Langfuse auto-captures:
- Token counts (input/output/total)
- Model name
- Latency
- Tool calls (name + args + result per step)
- Step count
- Finish reason

Manual additions via `propagateAttributes` + `setActiveTraceIO`:
- `sessionId`, `userId`, `tags`
- `metadata.model`, `metadata.ip`, `metadata.promptVersion`, `metadata.messageCount`
- `input` (last user message text)
- `output` (full assistant response)

User feedback:
- Score name: `user-feedback`, value `1` (positive) or `0` (negative), dataType `NUMERIC`

---

## Packages

```
@langfuse/otel β€” LangfuseSpanProcessor
@langfuse/tracing β€” observe(), propagateAttributes(), setActiveTraceIO(), getActiveTraceId()
@opentelemetry/api β€” trace.getActiveSpan()
@opentelemetry/sdk-trace-node β€” NodeTracerProvider
@opentelemetry/resources β€” resourceFromAttributes (v2)
```
18 changes: 18 additions & 0 deletions app/api/chat/feedback/route.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import { postFeedbackScore } from '@/lib/langfuse';

export async function POST(req: Request) {
const body = await req.json() as { traceId?: string; value?: number; comment?: string };

if (!body.traceId || typeof body.value !== 'number') {
return new Response(JSON.stringify({ error: 'traceId and value required' }), {
status: 400,
headers: { 'Content-Type': 'application/json' },
});
}

const value = body.value === 1 ? 1 : 0;

await postFeedbackScore(body.traceId, value, body.comment);

return new Response(null, { status: 204 });
}
Loading