Wire up claude-mem session summaries (fix dead /complete trigger)#22
Wire up claude-mem session summaries (fix dead /complete trigger)#22thedotmack wants to merge 31 commits into
Conversation
* feat: ingest Gemini Live conversations into the claude-mem pipeline Forward Gemini Live conversation turns and tool calls into the normal claude-mem worker pipeline (POST /api/sessions/observations -> queue -> LLM extraction -> observations), the same path a Claude Code PostToolUse hook uses. No bespoke storage layer. - Add claude_mem_sink.py: opt-in (CLAUDE_MEM_ENABLED), fail-soft async sink. Inits a session, flushes one tool-use observation per completed turn, and forwards Gemini tool_call events 1:1. Network errors are swallowed so the live audio session is never disturbed. - Wire the sink into gemini_live.py at the single drain-loop chokepoint, covering both the web /ws and Twilio phone paths with one insertion. - Add httpx dependency; ignore .scratch/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address Greptile review on claude_mem_sink - Remove unused asyncio import. - Log non-2xx worker responses at WARNING (still fail-soft, no raise). - Flush pending transcript before recording a tool_call observation so observations stay in chronological order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document the camera/screen -> Gemini Live video path (capture at 640x480
JPEG/1 FPS -> WS {type:"image"} -> video_input_queue -> send_video ->
send_realtime_input), with per-hop verification (log lines to grep),
camera/screen test steps, the secure-context requirement, how it relates
to claude-mem transcript ingestion, and a troubleshooting table. All
references cite exact file:line.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Standalone claude-mem worker (host :37778 -> container :37777) that ingests Gemini Live sessions without touching the host worker on :37777. - start.sh: sources GEMINI_API_KEY from ../.env (single source of truth) and injects it into the runtime settings the container reads; clears the stale worker.pid that made restarts exit 0 (pid=1 always "alive" in a fresh container); runs worker-service.cjs in the foreground. - gemini-live.json: "presence" mode whose taxonomy captures the person and the people with them from visual, behavioral, and conversational signal (person/companion/appearance/behavior/environment/conversation). - settings.json: committed as a key-free template (key comes from .env). - .gitignore: keep data/ out; the template settings.json is now tracked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire the autonomous GeminiLiveVision captioner into the Gemini Live send_video loop, add the observer-mode config and frontend surface for live presence notes, and document the design + a timeline report of the first successful live demo session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the session talks about planning an event, detect it in the turn transcript, extract the details (title/date/time/location/host) via the flash model, render an invitation image with Gemini's flash image model (gemini-3.1-flash-image-preview, legible in-image text), and push it to the frontend over the existing WebSocket as an `event_invitation` event. - claude_mem_sink.py: detection regex, extraction, image render (handles base64 inline_data per stored API docs), emit-back channel, fail-soft. - gemini_live.py: wire sink.emit to the event queue. - frontend: render an invitation card (main.js + style.css). - gemini-live.json v2.1.0: capture demo-discussed observation targets (names, headcount, expressions, accessories, atmosphere) + event planning. - docs: event-invitation-trigger-plan.md (API decisions sourced from mem-search). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… generator Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Observe-then-push: the worker classifies an exposed secret as a `security_alert` observation and the battle-tested TelegramNotifier pushes it — zero new notifier code, mirroring the canonical security_alert brainbeat in code.json. - gemini-live.json v2.2.0: register security_alert (🚨) / security_note (🔐) observation types with verbatim code.json descriptions. No prompt prose — the type registration alone is enough for the extractor to emit them. - settings.json: documented Telegram placeholders (no secrets committed). - start.sh: inject Telegram creds from ~/.claude-mem at launch (self-disables if absent so no junk requests); add WORKER_SCRIPTS_DIR to mount a telegram-capable worker bundle over the stale baked image (v12.1.0 predates the notifier). - claude_mem_sink.py / gemini_live.py: memory-recall tools + session-start context injection (get_memory_timeline / get_memory_observations). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ker (#3) * docs: add video ingestion run-and-test guide Document the camera/screen -> Gemini Live video path (capture at 640x480 JPEG/1 FPS -> WS {type:"image"} -> video_input_queue -> send_video -> send_realtime_input), with per-hop verification (log lines to grep), camera/screen test steps, the secure-context requirement, how it relates to claude-mem transcript ingestion, and a troubleshooting table. All references cite exact file:line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add isolated claude-mem docker worker for Gemini Live presence capture Standalone claude-mem worker (host :37778 -> container :37777) that ingests Gemini Live sessions without touching the host worker on :37777. - start.sh: sources GEMINI_API_KEY from ../.env (single source of truth) and injects it into the runtime settings the container reads; clears the stale worker.pid that made restarts exit 0 (pid=1 always "alive" in a fresh container); runs worker-service.cjs in the foreground. - gemini-live.json: "presence" mode whose taxonomy captures the person and the people with them from visual, behavioral, and conversational signal (person/companion/appearance/behavior/environment/conversation). - settings.json: committed as a key-free template (key comes from .env). - .gitignore: keep data/ out; the template settings.json is now tracked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add live vision-observation pipeline, frontend, and demo journey report Wire the autonomous GeminiLiveVision captioner into the Gemini Live send_video loop, add the observer-mode config and frontend surface for live presence notes, and document the design + a timeline report of the first successful live demo session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add event-planning trigger: auto-generate an invitation image When the session talks about planning an event, detect it in the turn transcript, extract the details (title/date/time/location/host) via the flash model, render an invitation image with Gemini's flash image model (gemini-3.1-flash-image-preview, legible in-image text), and push it to the frontend over the existing WebSocket as an `event_invitation` event. - claude_mem_sink.py: detection regex, extraction, image render (handles base64 inline_data per stored API docs), emit-back channel, fail-soft. - gemini_live.py: wire sink.emit to the event queue. - frontend: render an invitation card (main.js + style.css). - gemini-live.json v2.1.0: capture demo-discussed observation targets (names, headcount, expressions, accessories, atmosphere) + event planning. - docs: event-invitation-trigger-plan.md (API decisions sourced from mem-search). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add "Memi's First Memories" slide deck of the demo journey report Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add security_alert → Telegram brainbeat to the Gemini Live docker worker Observe-then-push: the worker classifies an exposed secret as a `security_alert` observation and the battle-tested TelegramNotifier pushes it — zero new notifier code, mirroring the canonical security_alert brainbeat in code.json. - gemini-live.json v2.2.0: register security_alert (🚨) / security_note (🔐) observation types with verbatim code.json descriptions. No prompt prose — the type registration alone is enough for the extractor to emit them. - settings.json: documented Telegram placeholders (no secrets committed). - start.sh: inject Telegram creds from ~/.claude-mem at launch (self-disables if absent so no junk requests); add WORKER_SCRIPTS_DIR to mount a telegram-capable worker bundle over the stale baked image (v12.1.0 predates the notifier). - claude_mem_sink.py / gemini_live.py: memory-recall tools + session-start context injection (get_memory_timeline / get_memory_observations). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit and consolidate every LLM-facing prompt, tool description, and
spoken/greeting string used by the running app into a single
gemini-live-genai-python-sdk/prompts.json, loaded once (fail-fast) by a
new prompts.py. No prompt text remains hard-coded in the Python modules.
Centralized from:
- gemini_live.py: assistant system instruction, memory-context injection
(placeholder {session_start_context}), memory-recall instructions, voice
- claude_mem_sink.py: vision captioner (+ NO_CHANGE token), event-detail
extraction, invitation-image render, the two memory-recall tool
descriptions, and the session-init label
- twilio_handler.py: phone-call greeting
- main.py: TwiML "Connecting to Gemini Live" spoken line
The claude-mem observer mode (claude-mem-docker/gemini-live.json) is
intentionally NOT folded in: it is consumed by the separate claude-mem
worker process, not this app, across the :37777 REST boundary.
Adds PROMPTS.md documenting every prompt (in interaction order), the full
stack, the models in use, and how to edit prompts.json safely. Also
gitignores the scratch tmp/ server log.
Net -24 lines of code despite the indirection. Verified: prompts.json
loads, all placeholders format end-to-end, and all four modules import
cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Track the four demo/documentation artifacts derived from the isolated claude-mem Docker worker's observation database (420 observations, 16 sessions, May 23 2026): the raw JSON export, the Markdown digest, the narrative essay, and the slide deck. - Slide deck recompressed at JPEG quality 80 (13 MB -> 0.56 MB) by rasterizing each page at 150 dpi; text and illustrations remain legible. - Deliberately-fake test credentials (sk_live_, AKIA, the AWS docs example secret) used to exercise the security-alert path were redacted to placeholders in the digest and JSON so they don't trip secret scanning; the alerts' narrative is preserved. - Adds reports/README.md describing all four files, the dataset, and the redaction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- claude_mem_sink.py: drop the unused CLAUDE_MEM_VISION_INTERVAL_SECONDS env knob and inline the 5s captioner cadence as a constant - docs: observation strategy + actions-system design and execution plan Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The isolated claude-mem worker was silently generating observations with gemini-2.5-flash, not Claude — the "gemini-live" MODE name was conflated with the observer PROVIDER. Yesterday's entire demo (785 obs) was Gemini-written. - settings.json: provider=claude, CLAUDE_MEM_MODEL=claude-sonnet-4-6, auth_method=subscription (matches the host worker). - start.sh: extract the host Claude OAuth creds (keychain 'Claude Code-credentials', else ~/.claude/.credentials.json) and mount them read-only so the container's subscription auth works — mirrors claude-mem's shipped docker/claude-mem/run.sh. Rewrote the header comment that kept "proving" gemini was intentional (MODE != PROVIDER), and made the empty SCRIPTS_MOUNT array expansion bash-3.2 safe. - gitignore the transient .auth/ creds dir and db-backups/. Verified: new observations now carry generated_by_model=claude-sonnet-4-6; worker stable (RestartCount=0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e-prompts # Conflicts: # gemini-live-genai-python-sdk/claude-mem-docker/.gitignore # gemini-live-genai-python-sdk/claude-mem-docker/settings.json # gemini-live-genai-python-sdk/claude-mem-docker/start.sh # gemini-live-genai-python-sdk/claude_mem_sink.py # gemini-live-genai-python-sdk/gemini_live.py
Greptile flagged three sites where str.format() interpolated untrusted
runtime content (LLM captions, API-fetched memory, rolling conversation)
into prompt templates. Any {key} pattern in that content triggered a
KeyError — silently and permanently disabling the vision captioner, or
tearing down the Gemini Live session at startup.
Replace .format() with literal .replace("{placeholder}", value) at all
three sites. In the VISION_PROMPT chain the trusted {no_change_token} is
replaced before the untrusted {prev}, so injected braces cannot trigger a
second substitution.
- claude_mem_sink.py:240 VISION_PROMPT (prev, no_change_token)
- claude_mem_sink.py:322 event_invitation extraction_prompt (conversation)
- gemini_live.py:50 memory_context_section (session_start_context)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Centralize all app prompts into one prompts.json
Single-container image runs the Gemini Live FastAPI app plus a claude-mem worker (provider=gemini, Chroma off) so memory works in the cloud without Claude OAuth. Adds combined Dockerfile, entrypoint, fly.toml (always-on single machine + persistent volume for the memory DB), and a deploy guide. https://claude.ai/code/session_01KMmTdL5PjucWj8kZS2fA3H
…kbmcL Add Fly.io deployment with bundled claude-mem worker
Deploys the app + claude-mem worker to Fly on push to main (or manual dispatch), running on GitHub runners that can reach Fly. Idempotently creates the app and memory volume, stages the Gemini key, and deploys via remote builder. Requires repo secrets FLY_API_TOKEN and GEMINI_API_KEY. https://claude.ai/code/session_01KMmTdL5PjucWj8kZS2fA3H
…kbmcL Auto-deploy to Fly via GitHub Actions
Add a fail-soft background poller in MemorySink (mirrors _caption_loop) that reads newly-extracted observations back from the claude-mem worker (GET /api/observations?project=...) and pushes each through the existing self.emit channel as an "observation" event — the same channel the event invitation already uses. Seeds a high-water mark at session start so the feed streams only memories formed during this session. No new storage: it surfaces the observations this pipeline already records, so the browser can show the memory building in real time (the demo's "notes on screen"). Opt-in via CLAUDE_MEM_MEMORY_FEED_ENABLED (default on) and fully fail-soft — a down/unreachable worker never disturbs the live audio session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Render the "observation" events streamed by the sink into a new Memory panel under the camera in the left pane: each memory is a card with its type emoji (matching the gemini-live.json mode), title, and subtitle, popped in as it forms. Mirrors the existing appendInvitation pattern; uses textContent only (observations are model-generated text). Resets with the session. This is the demo's "notes displayed live on screen" — the agent's memory made visible in the app that already exists, not a new app. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The docker worker was switched to Claude (claude-sonnet-4-6) in 3a05eb9 on the assumption the demo's gemini provider was a misconfiguration. It wasn't: the hackathon demo ran observation extraction on gemini-2.5-flash and it was noticeably faster. Switch the observer back to gemini and stop start.sh from hard-requiring Claude OAuth creds. - settings.json: CLAUDE_MEM_PROVIDER claude -> gemini; drop the claude-only auth/model keys (gemini uses GEMINI_API_KEY from .env). - start.sh: gate OAuth cred extraction + mount behind provider=claude so a gemini worker launches on the API key alone; fix the misleading header. Verified: worker relaunches healthy on :37778 with provider=gemini, gemini-2.5-flash in the rendered runtime settings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live memory feed + restore Gemini observer provider
The public demo ran on a single shared free-tier server key, which a single
live session exhausts in seconds (429 quota), killing memory observations for
everyone. This makes each visitor supply their own Gemini API key in the UI;
the live session AND the memory observations run on that key, isolated per key.
Backend (gemini-live-genai-python-sdk):
- main.py: read a mandatory {"type":"setup","api_key":"..."} as the FIRST WS
frame (body, never a query param/log) and build the live client with it;
reject the connection if absent. Surface a live-session failure (e.g. a key
without Gemini Live access) as a visible error instead of a silent dead
session.
- claude_mem_sink.py: thread the visitor key in; normalize once; derive a
per-visitor project namespace gemini-live-<sha256(key)[:12]> (one-way +
deterministic, so a returning visitor recovers their own memories and two
keys never see each other's feed); send geminiApiKey at /api/sessions/init.
- gemini_live.py: pass the session key to the sink.
Worker (vendored): claude-mem 13.3.0 rebuilt from a clean checkout with a
per-session geminiApiKey override (getGeminiConfig prefers it; held in-memory
only, never persisted). Vendored as claude-mem-docker/worker-service.cjs and
overwritten into the image; reproducible via claude-mem-docker/worker-byo-key.patch.
Image (Dockerfile, docker-entrypoint.sh): copy the patched worker over the
published one; boot the worker with a clearly-fake placeholder key (NOT empty —
an empty key makes the worker fall back to the Claude SDK, which has no creds in
this image; the placeholder keeps the Gemini provider selected and is overridden
per session). GEMINI_API_KEY is now optional (Twilio-only).
Frontend: mandatory password key input gating Connect, persisted in
localStorage (paste-once), sent as the first WS frame; error surfaced on reject.
Also includes the previously-uncommitted WebSocket-disconnect mid-session fix
(client_disconnected handling + /api/sessions/complete).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per-visitor isolation: claude-mem project namespace derived from sha256(gemini_api_key)[:12], so each BYO key gets its own memory and returning visitors recover prior observations. Batch reads now scope to the caller's namespace, closing a cross-tenant read leak. Worker resilience: docker-entrypoint.sh gains a start_worker()/supervise_worker() loop that clears stale pid and respawns the worker on health failure; the Python sink re-inits with the session key when the worker pid changes. These changes were already deployed to Fly v5 via flyctl and verified end-to-end (per-key recall across sessions + worker crash recovery); this commit puts them under version control so a CI deploy can't overwrite them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
.agents/, .claude/, .gstack/, skills-lock.json, reports/, and the dev-history markdown are local tooling artifacts, not project files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…example Adds CLAUDE.md (root + sdk), the web/ Next.js static-export frontend, and ignores browser-agent/ which is leftover upstream example code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Port the talking Pepe head from the pepe-hq (Pepe-Agent) project into the web/ frontend so the live agent has a face that lip-syncs to its own voice. - web/components/PepeHead.tsx: ported from Pepe-Agent (next/image -> plain <img> for the static export; mouth frame derived during render instead of in an effect to satisfy this repo's eslint). Float, blink, eye-tracking, 5-level volume->mouth-frame mapping, speaking pulse, speech bubble. - web/public/frames + web/public/eyes: the 14 webp sprite layers. - web/lib/media-handler.ts: AnalyserNode spliced inline on the playback graph (source -> analyser -> destination) so getAgentAmplitude() reads the agent's real 24kHz voice for lip-sync; isAgentSpeaking() from the scheduled-audio clock. Avatar-only, fail-soft, never alters the audio path. - web/hooks/useGeminiSession.ts: RAF poll exposes agentVolume/agentSpeaking while live. - web/app/page.tsx: Pepe head mounted as the "Agent" stage atop the left column; mobile-friendly padding/typography. layout.tsx: mobile viewport + theme color. - motion ^12 added for the avatar animations. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The sink ended every session by POSTing /api/sessions/complete, which the worker does not route (404, swallowed fail-soft). The real summary trigger is /api/sessions/summarize (claude-mem's Stop-hook equivalent), so no session summary was ever generated — recall showed bare "Gemini Live session" prompts with no Learned/Completed/Next-Steps, and cross-session memory never gelled. - Replace _complete_session with _summarize_session → POST /api/sessions/summarize with contentSessionId, platformSource, and a recent-transcript recap as last_assistant_message to anchor generation. - Add _summary_checkpoint_loop: periodic re-summarize (CLAUDE_MEM_SUMMARY_INTERVAL, default 120s) that fires only when new observations have accrued, so memory survives an unclean disconnect that skips on_session_end. Re-summarizing is the normal claude-mem path; the worker keeps the latest summary per session. - Track activity via _obs_since_summary (incremented in _post_observation). - Clarify init prompt label; add session.summary_recap_fallback for vision-only sessions; document CLAUDE_MEM_SUMMARY_INTERVAL and the summarize requirement. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Closing — opened against the wrong repo (fork parent) by mistake. This belongs on thedotmack/gemini-live-mem. |
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Code Review
This pull request integrates the Gemini Live voice/video web application with claude-mem to enable persistent, structured memory across sessions. It introduces a Next.js static export frontend featuring a talking Pepe avatar that lip-syncs to the agent's voice, a centralized prompt configuration, and a fail-soft memory sink that handles real-time observations, vision captioning, and event-invitation image generation. The deployment is containerized to run both the FastAPI app and a patched claude-mem worker supporting bring-your-own Gemini API keys. Feedback on the changes highlights a high-frequency re-render performance bottleneck in the React state hook and a potential AttributeError in the memory sink when vision is disabled but event invitations are active.
| useEffect(() => { | ||
| if (phase !== "live") return; | ||
| const tick = () => { | ||
| const media = mediaRef.current; | ||
| setAgentVolume(media ? media.getAgentAmplitude() : 0); | ||
| setAgentSpeaking(media ? media.isAgentSpeaking() : false); | ||
| rafRef.current = requestAnimationFrame(tick); | ||
| }; | ||
| rafRef.current = requestAnimationFrame(tick); | ||
| return () => { | ||
| if (rafRef.current !== null) cancelAnimationFrame(rafRef.current); | ||
| rafRef.current = null; | ||
| }; | ||
| }, [phase]); |
There was a problem hiding this comment.
Performance Bottleneck: High-Frequency Re-renders
Running a requestAnimationFrame loop that updates state (setAgentVolume, setAgentSpeaking) at the top level of useGeminiSession will force the entire Home page (including VideoStage, MemoryFeed, ChatPanel, and Composer) to re-render up to 60 times per second. This can cause severe CPU thrashing and UI stuttering, especially on mobile or lower-end devices.
Recommendation
Localize the high-frequency polling to the avatar component. You can expose a stable getter function or the mediaHandler instance from useGeminiSession, and let PepeHead (or a small wrapper around it) run the requestAnimationFrame loop and manage its own local state.
| ) | ||
| self.vision_model = os.getenv("CLAUDE_MEM_VISION_MODEL", "gemini-flash-latest") | ||
| self.vision_interval = 5.0 # seconds between frame captions | ||
| self._genai = genai.Client(api_key=gemini_api_key) if self.vision_enabled else None |
There was a problem hiding this comment.
Potential AttributeError: Uninitialized self._genai when Vision is Disabled
If CLAUDE_MEM_VISION_ENABLED is set to "false", self.vision_enabled becomes False, and self._genai is initialized to None. However, if CLAUDE_MEM_INVITATION_ENABLED is still "true", the event invitation feature will still trigger and attempt to call _extract_event_details and _render_invitation_image, both of which invoke self._genai.aio.models.generate_content(...). This will raise an unhandled AttributeError: 'NoneType' object has no attribute 'aio'.
Recommendation
Initialize self._genai if either self.vision_enabled or self.invitation_enabled is enabled.
| self._genai = genai.Client(api_key=gemini_api_key) if self.vision_enabled else None | |
| self._genai = ( | |
| genai.Client(api_key=gemini_api_key) | |
| if (self.vision_enabled or self.invitation_enabled) | |
| else None | |
| ) |
Fixes the dead POST /api/sessions/complete (404) by calling POST /api/sessions/summarize on session end + periodic checkpoint, so claude-mem session summaries actually generate. Verified: old=/complete→404, new=/summarize→200 queued. See commit body for detail.