Skip to content

Wire up claude-mem session summaries (fix dead /complete trigger)#22

Closed
thedotmack wants to merge 31 commits into
google-gemini:mainfrom
thedotmack:mango-jargon
Closed

Wire up claude-mem session summaries (fix dead /complete trigger)#22
thedotmack wants to merge 31 commits into
google-gemini:mainfrom
thedotmack:mango-jargon

Conversation

@thedotmack

Copy link
Copy Markdown

Fixes the dead POST /api/sessions/complete (404) by calling POST /api/sessions/summarize on session end + periodic checkpoint, so claude-mem session summaries actually generate. Verified: old=/complete→404, new=/summarize→200 queued. See commit body for detail.

thedotmack and others added 30 commits May 23, 2026 12:01
* feat: ingest Gemini Live conversations into the claude-mem pipeline

Forward Gemini Live conversation turns and tool calls into the normal
claude-mem worker pipeline (POST /api/sessions/observations -> queue ->
LLM extraction -> observations), the same path a Claude Code PostToolUse
hook uses. No bespoke storage layer.

- Add claude_mem_sink.py: opt-in (CLAUDE_MEM_ENABLED), fail-soft async
  sink. Inits a session, flushes one tool-use observation per completed
  turn, and forwards Gemini tool_call events 1:1. Network errors are
  swallowed so the live audio session is never disturbed.
- Wire the sink into gemini_live.py at the single drain-loop chokepoint,
  covering both the web /ws and Twilio phone paths with one insertion.
- Add httpx dependency; ignore .scratch/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile review on claude_mem_sink

- Remove unused asyncio import.
- Log non-2xx worker responses at WARNING (still fail-soft, no raise).
- Flush pending transcript before recording a tool_call observation so
  observations stay in chronological order.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document the camera/screen -> Gemini Live video path (capture at 640x480
JPEG/1 FPS -> WS {type:"image"} -> video_input_queue -> send_video ->
send_realtime_input), with per-hop verification (log lines to grep),
camera/screen test steps, the secure-context requirement, how it relates
to claude-mem transcript ingestion, and a troubleshooting table. All
references cite exact file:line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Standalone claude-mem worker (host :37778 -> container :37777) that ingests
Gemini Live sessions without touching the host worker on :37777.

- start.sh: sources GEMINI_API_KEY from ../.env (single source of truth) and
  injects it into the runtime settings the container reads; clears the stale
  worker.pid that made restarts exit 0 (pid=1 always "alive" in a fresh
  container); runs worker-service.cjs in the foreground.
- gemini-live.json: "presence" mode whose taxonomy captures the person and the
  people with them from visual, behavioral, and conversational signal
  (person/companion/appearance/behavior/environment/conversation).
- settings.json: committed as a key-free template (key comes from .env).
- .gitignore: keep data/ out; the template settings.json is now tracked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire the autonomous GeminiLiveVision captioner into the Gemini Live
send_video loop, add the observer-mode config and frontend surface for
live presence notes, and document the design + a timeline report of the
first successful live demo session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the session talks about planning an event, detect it in the turn
transcript, extract the details (title/date/time/location/host) via the
flash model, render an invitation image with Gemini's flash image model
(gemini-3.1-flash-image-preview, legible in-image text), and push it to
the frontend over the existing WebSocket as an `event_invitation` event.

- claude_mem_sink.py: detection regex, extraction, image render (handles
  base64 inline_data per stored API docs), emit-back channel, fail-soft.
- gemini_live.py: wire sink.emit to the event queue.
- frontend: render an invitation card (main.js + style.css).
- gemini-live.json v2.1.0: capture demo-discussed observation targets
  (names, headcount, expressions, accessories, atmosphere) + event planning.
- docs: event-invitation-trigger-plan.md (API decisions sourced from mem-search).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… generator

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Observe-then-push: the worker classifies an exposed secret as a `security_alert`
observation and the battle-tested TelegramNotifier pushes it — zero new notifier
code, mirroring the canonical security_alert brainbeat in code.json.

- gemini-live.json v2.2.0: register security_alert (🚨) / security_note (🔐)
  observation types with verbatim code.json descriptions. No prompt prose —
  the type registration alone is enough for the extractor to emit them.
- settings.json: documented Telegram placeholders (no secrets committed).
- start.sh: inject Telegram creds from ~/.claude-mem at launch (self-disables
  if absent so no junk requests); add WORKER_SCRIPTS_DIR to mount a
  telegram-capable worker bundle over the stale baked image (v12.1.0 predates
  the notifier).
- claude_mem_sink.py / gemini_live.py: memory-recall tools + session-start
  context injection (get_memory_timeline / get_memory_observations).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ker (#3)

* docs: add video ingestion run-and-test guide

Document the camera/screen -> Gemini Live video path (capture at 640x480
JPEG/1 FPS -> WS {type:"image"} -> video_input_queue -> send_video ->
send_realtime_input), with per-hop verification (log lines to grep),
camera/screen test steps, the secure-context requirement, how it relates
to claude-mem transcript ingestion, and a troubleshooting table. All
references cite exact file:line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add isolated claude-mem docker worker for Gemini Live presence capture

Standalone claude-mem worker (host :37778 -> container :37777) that ingests
Gemini Live sessions without touching the host worker on :37777.

- start.sh: sources GEMINI_API_KEY from ../.env (single source of truth) and
  injects it into the runtime settings the container reads; clears the stale
  worker.pid that made restarts exit 0 (pid=1 always "alive" in a fresh
  container); runs worker-service.cjs in the foreground.
- gemini-live.json: "presence" mode whose taxonomy captures the person and the
  people with them from visual, behavioral, and conversational signal
  (person/companion/appearance/behavior/environment/conversation).
- settings.json: committed as a key-free template (key comes from .env).
- .gitignore: keep data/ out; the template settings.json is now tracked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add live vision-observation pipeline, frontend, and demo journey report

Wire the autonomous GeminiLiveVision captioner into the Gemini Live
send_video loop, add the observer-mode config and frontend surface for
live presence notes, and document the design + a timeline report of the
first successful live demo session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add event-planning trigger: auto-generate an invitation image

When the session talks about planning an event, detect it in the turn
transcript, extract the details (title/date/time/location/host) via the
flash model, render an invitation image with Gemini's flash image model
(gemini-3.1-flash-image-preview, legible in-image text), and push it to
the frontend over the existing WebSocket as an `event_invitation` event.

- claude_mem_sink.py: detection regex, extraction, image render (handles
  base64 inline_data per stored API docs), emit-back channel, fail-soft.
- gemini_live.py: wire sink.emit to the event queue.
- frontend: render an invitation card (main.js + style.css).
- gemini-live.json v2.1.0: capture demo-discussed observation targets
  (names, headcount, expressions, accessories, atmosphere) + event planning.
- docs: event-invitation-trigger-plan.md (API decisions sourced from mem-search).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add "Memi's First Memories" slide deck of the demo journey report

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add security_alert → Telegram brainbeat to the Gemini Live docker worker

Observe-then-push: the worker classifies an exposed secret as a `security_alert`
observation and the battle-tested TelegramNotifier pushes it — zero new notifier
code, mirroring the canonical security_alert brainbeat in code.json.

- gemini-live.json v2.2.0: register security_alert (🚨) / security_note (🔐)
  observation types with verbatim code.json descriptions. No prompt prose —
  the type registration alone is enough for the extractor to emit them.
- settings.json: documented Telegram placeholders (no secrets committed).
- start.sh: inject Telegram creds from ~/.claude-mem at launch (self-disables
  if absent so no junk requests); add WORKER_SCRIPTS_DIR to mount a
  telegram-capable worker bundle over the stale baked image (v12.1.0 predates
  the notifier).
- claude_mem_sink.py / gemini_live.py: memory-recall tools + session-start
  context injection (get_memory_timeline / get_memory_observations).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit and consolidate every LLM-facing prompt, tool description, and
spoken/greeting string used by the running app into a single
gemini-live-genai-python-sdk/prompts.json, loaded once (fail-fast) by a
new prompts.py. No prompt text remains hard-coded in the Python modules.

Centralized from:
- gemini_live.py: assistant system instruction, memory-context injection
  (placeholder {session_start_context}), memory-recall instructions, voice
- claude_mem_sink.py: vision captioner (+ NO_CHANGE token), event-detail
  extraction, invitation-image render, the two memory-recall tool
  descriptions, and the session-init label
- twilio_handler.py: phone-call greeting
- main.py: TwiML "Connecting to Gemini Live" spoken line

The claude-mem observer mode (claude-mem-docker/gemini-live.json) is
intentionally NOT folded in: it is consumed by the separate claude-mem
worker process, not this app, across the :37777 REST boundary.

Adds PROMPTS.md documenting every prompt (in interaction order), the full
stack, the models in use, and how to edit prompts.json safely. Also
gitignores the scratch tmp/ server log.

Net -24 lines of code despite the indirection. Verified: prompts.json
loads, all placeholders format end-to-end, and all four modules import
cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Track the four demo/documentation artifacts derived from the isolated
claude-mem Docker worker's observation database (420 observations, 16
sessions, May 23 2026): the raw JSON export, the Markdown digest, the
narrative essay, and the slide deck.

- Slide deck recompressed at JPEG quality 80 (13 MB -> 0.56 MB) by
  rasterizing each page at 150 dpi; text and illustrations remain legible.
- Deliberately-fake test credentials (sk_live_, AKIA, the AWS docs example
  secret) used to exercise the security-alert path were redacted to
  placeholders in the digest and JSON so they don't trip secret scanning;
  the alerts' narrative is preserved.
- Adds reports/README.md describing all four files, the dataset, and the
  redaction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- claude_mem_sink.py: drop the unused CLAUDE_MEM_VISION_INTERVAL_SECONDS env
  knob and inline the 5s captioner cadence as a constant
- docs: observation strategy + actions-system design and execution plan

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The isolated claude-mem worker was silently generating observations with
gemini-2.5-flash, not Claude — the "gemini-live" MODE name was conflated with
the observer PROVIDER. Yesterday's entire demo (785 obs) was Gemini-written.

- settings.json: provider=claude, CLAUDE_MEM_MODEL=claude-sonnet-4-6,
  auth_method=subscription (matches the host worker).
- start.sh: extract the host Claude OAuth creds (keychain 'Claude Code-credentials',
  else ~/.claude/.credentials.json) and mount them read-only so the container's
  subscription auth works — mirrors claude-mem's shipped docker/claude-mem/run.sh.
  Rewrote the header comment that kept "proving" gemini was intentional (MODE !=
  PROVIDER), and made the empty SCRIPTS_MOUNT array expansion bash-3.2 safe.
- gitignore the transient .auth/ creds dir and db-backups/.

Verified: new observations now carry generated_by_model=claude-sonnet-4-6;
worker stable (RestartCount=0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e-prompts

# Conflicts:
#	gemini-live-genai-python-sdk/claude-mem-docker/.gitignore
#	gemini-live-genai-python-sdk/claude-mem-docker/settings.json
#	gemini-live-genai-python-sdk/claude-mem-docker/start.sh
#	gemini-live-genai-python-sdk/claude_mem_sink.py
#	gemini-live-genai-python-sdk/gemini_live.py
Greptile flagged three sites where str.format() interpolated untrusted
runtime content (LLM captions, API-fetched memory, rolling conversation)
into prompt templates. Any {key} pattern in that content triggered a
KeyError — silently and permanently disabling the vision captioner, or
tearing down the Gemini Live session at startup.

Replace .format() with literal .replace("{placeholder}", value) at all
three sites. In the VISION_PROMPT chain the trusted {no_change_token} is
replaced before the untrusted {prev}, so injected braces cannot trigger a
second substitution.

- claude_mem_sink.py:240  VISION_PROMPT (prev, no_change_token)
- claude_mem_sink.py:322  event_invitation extraction_prompt (conversation)
- gemini_live.py:50       memory_context_section (session_start_context)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Centralize all app prompts into one prompts.json
Single-container image runs the Gemini Live FastAPI app plus a claude-mem
worker (provider=gemini, Chroma off) so memory works in the cloud without
Claude OAuth. Adds combined Dockerfile, entrypoint, fly.toml (always-on single
machine + persistent volume for the memory DB), and a deploy guide.

https://claude.ai/code/session_01KMmTdL5PjucWj8kZS2fA3H
…kbmcL

Add Fly.io deployment with bundled claude-mem worker
Deploys the app + claude-mem worker to Fly on push to main (or manual
dispatch), running on GitHub runners that can reach Fly. Idempotently
creates the app and memory volume, stages the Gemini key, and deploys via
remote builder. Requires repo secrets FLY_API_TOKEN and GEMINI_API_KEY.

https://claude.ai/code/session_01KMmTdL5PjucWj8kZS2fA3H
…kbmcL

Auto-deploy to Fly via GitHub Actions
Add a fail-soft background poller in MemorySink (mirrors _caption_loop) that
reads newly-extracted observations back from the claude-mem worker
(GET /api/observations?project=...) and pushes each through the existing
self.emit channel as an "observation" event — the same channel the event
invitation already uses. Seeds a high-water mark at session start so the feed
streams only memories formed during this session.

No new storage: it surfaces the observations this pipeline already records, so
the browser can show the memory building in real time (the demo's "notes on
screen"). Opt-in via CLAUDE_MEM_MEMORY_FEED_ENABLED (default on) and fully
fail-soft — a down/unreachable worker never disturbs the live audio session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Render the "observation" events streamed by the sink into a new Memory panel
under the camera in the left pane: each memory is a card with its type emoji
(matching the gemini-live.json mode), title, and subtitle, popped in as it
forms. Mirrors the existing appendInvitation pattern; uses textContent only
(observations are model-generated text). Resets with the session.

This is the demo's "notes displayed live on screen" — the agent's memory made
visible in the app that already exists, not a new app.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The docker worker was switched to Claude (claude-sonnet-4-6) in 3a05eb9 on
the assumption the demo's gemini provider was a misconfiguration. It wasn't:
the hackathon demo ran observation extraction on gemini-2.5-flash and it was
noticeably faster. Switch the observer back to gemini and stop start.sh from
hard-requiring Claude OAuth creds.

- settings.json: CLAUDE_MEM_PROVIDER claude -> gemini; drop the claude-only
  auth/model keys (gemini uses GEMINI_API_KEY from .env).
- start.sh: gate OAuth cred extraction + mount behind provider=claude so a
  gemini worker launches on the API key alone; fix the misleading header.

Verified: worker relaunches healthy on :37778 with provider=gemini,
gemini-2.5-flash in the rendered runtime settings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live memory feed + restore Gemini observer provider
The public demo ran on a single shared free-tier server key, which a single
live session exhausts in seconds (429 quota), killing memory observations for
everyone. This makes each visitor supply their own Gemini API key in the UI;
the live session AND the memory observations run on that key, isolated per key.

Backend (gemini-live-genai-python-sdk):
- main.py: read a mandatory {"type":"setup","api_key":"..."} as the FIRST WS
  frame (body, never a query param/log) and build the live client with it;
  reject the connection if absent. Surface a live-session failure (e.g. a key
  without Gemini Live access) as a visible error instead of a silent dead
  session.
- claude_mem_sink.py: thread the visitor key in; normalize once; derive a
  per-visitor project namespace gemini-live-<sha256(key)[:12]> (one-way +
  deterministic, so a returning visitor recovers their own memories and two
  keys never see each other's feed); send geminiApiKey at /api/sessions/init.
- gemini_live.py: pass the session key to the sink.

Worker (vendored): claude-mem 13.3.0 rebuilt from a clean checkout with a
per-session geminiApiKey override (getGeminiConfig prefers it; held in-memory
only, never persisted). Vendored as claude-mem-docker/worker-service.cjs and
overwritten into the image; reproducible via claude-mem-docker/worker-byo-key.patch.

Image (Dockerfile, docker-entrypoint.sh): copy the patched worker over the
published one; boot the worker with a clearly-fake placeholder key (NOT empty —
an empty key makes the worker fall back to the Claude SDK, which has no creds in
this image; the placeholder keeps the Gemini provider selected and is overridden
per session). GEMINI_API_KEY is now optional (Twilio-only).

Frontend: mandatory password key input gating Connect, persisted in
localStorage (paste-once), sent as the first WS frame; error surfaced on reject.

Also includes the previously-uncommitted WebSocket-disconnect mid-session fix
(client_disconnected handling + /api/sessions/complete).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per-visitor isolation: claude-mem project namespace derived from
sha256(gemini_api_key)[:12], so each BYO key gets its own memory and
returning visitors recover prior observations. Batch reads now scope to
the caller's namespace, closing a cross-tenant read leak.

Worker resilience: docker-entrypoint.sh gains a start_worker()/supervise_worker()
loop that clears stale pid and respawns the worker on health failure; the
Python sink re-inits with the session key when the worker pid changes.

These changes were already deployed to Fly v5 via flyctl and verified
end-to-end (per-key recall across sessions + worker crash recovery); this
commit puts them under version control so a CI deploy can't overwrite them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
.agents/, .claude/, .gstack/, skills-lock.json, reports/, and the
dev-history markdown are local tooling artifacts, not project files.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…example

Adds CLAUDE.md (root + sdk), the web/ Next.js static-export frontend, and
ignores browser-agent/ which is leftover upstream example code.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Port the talking Pepe head from the pepe-hq (Pepe-Agent) project into the
web/ frontend so the live agent has a face that lip-syncs to its own voice.

- web/components/PepeHead.tsx: ported from Pepe-Agent (next/image -> plain
  <img> for the static export; mouth frame derived during render instead of
  in an effect to satisfy this repo's eslint). Float, blink, eye-tracking,
  5-level volume->mouth-frame mapping, speaking pulse, speech bubble.
- web/public/frames + web/public/eyes: the 14 webp sprite layers.
- web/lib/media-handler.ts: AnalyserNode spliced inline on the playback graph
  (source -> analyser -> destination) so getAgentAmplitude() reads the agent's
  real 24kHz voice for lip-sync; isAgentSpeaking() from the scheduled-audio
  clock. Avatar-only, fail-soft, never alters the audio path.
- web/hooks/useGeminiSession.ts: RAF poll exposes agentVolume/agentSpeaking
  while live.
- web/app/page.tsx: Pepe head mounted as the "Agent" stage atop the left
  column; mobile-friendly padding/typography. layout.tsx: mobile viewport
  + theme color.
- motion ^12 added for the avatar animations.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The sink ended every session by POSTing /api/sessions/complete, which the
worker does not route (404, swallowed fail-soft). The real summary trigger is
/api/sessions/summarize (claude-mem's Stop-hook equivalent), so no session
summary was ever generated — recall showed bare "Gemini Live session" prompts
with no Learned/Completed/Next-Steps, and cross-session memory never gelled.

- Replace _complete_session with _summarize_session → POST /api/sessions/summarize
  with contentSessionId, platformSource, and a recent-transcript recap as
  last_assistant_message to anchor generation.
- Add _summary_checkpoint_loop: periodic re-summarize (CLAUDE_MEM_SUMMARY_INTERVAL,
  default 120s) that fires only when new observations have accrued, so memory
  survives an unclean disconnect that skips on_session_end. Re-summarizing is the
  normal claude-mem path; the worker keeps the latest summary per session.
- Track activity via _obs_since_summary (incremented in _post_observation).
- Clarify init prompt label; add session.summary_recap_fallback for vision-only
  sessions; document CLAUDE_MEM_SUMMARY_INTERVAL and the summarize requirement.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@thedotmack

Copy link
Copy Markdown
Author

Closing — opened against the wrong repo (fork parent) by mistake. This belongs on thedotmack/gemini-live-mem.

@thedotmack thedotmack closed this May 30, 2026
@google-cla

google-cla Bot commented May 30, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates the Gemini Live voice/video web application with claude-mem to enable persistent, structured memory across sessions. It introduces a Next.js static export frontend featuring a talking Pepe avatar that lip-syncs to the agent's voice, a centralized prompt configuration, and a fail-soft memory sink that handles real-time observations, vision captioning, and event-invitation image generation. The deployment is containerized to run both the FastAPI app and a patched claude-mem worker supporting bring-your-own Gemini API keys. Feedback on the changes highlights a high-frequency re-render performance bottleneck in the React state hook and a potential AttributeError in the memory sink when vision is disabled but event invitations are active.

Comment on lines +81 to +94
useEffect(() => {
if (phase !== "live") return;
const tick = () => {
const media = mediaRef.current;
setAgentVolume(media ? media.getAgentAmplitude() : 0);
setAgentSpeaking(media ? media.isAgentSpeaking() : false);
rafRef.current = requestAnimationFrame(tick);
};
rafRef.current = requestAnimationFrame(tick);
return () => {
if (rafRef.current !== null) cancelAnimationFrame(rafRef.current);
rafRef.current = null;
};
}, [phase]);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Performance Bottleneck: High-Frequency Re-renders

Running a requestAnimationFrame loop that updates state (setAgentVolume, setAgentSpeaking) at the top level of useGeminiSession will force the entire Home page (including VideoStage, MemoryFeed, ChatPanel, and Composer) to re-render up to 60 times per second. This can cause severe CPU thrashing and UI stuttering, especially on mobile or lower-end devices.

Recommendation

Localize the high-frequency polling to the avatar component. You can expose a stable getter function or the mediaHandler instance from useGeminiSession, and let PepeHead (or a small wrapper around it) run the requestAnimationFrame loop and manage its own local state.

)
self.vision_model = os.getenv("CLAUDE_MEM_VISION_MODEL", "gemini-flash-latest")
self.vision_interval = 5.0 # seconds between frame captions
self._genai = genai.Client(api_key=gemini_api_key) if self.vision_enabled else None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Potential AttributeError: Uninitialized self._genai when Vision is Disabled

If CLAUDE_MEM_VISION_ENABLED is set to "false", self.vision_enabled becomes False, and self._genai is initialized to None. However, if CLAUDE_MEM_INVITATION_ENABLED is still "true", the event invitation feature will still trigger and attempt to call _extract_event_details and _render_invitation_image, both of which invoke self._genai.aio.models.generate_content(...). This will raise an unhandled AttributeError: 'NoneType' object has no attribute 'aio'.

Recommendation

Initialize self._genai if either self.vision_enabled or self.invitation_enabled is enabled.

Suggested change
self._genai = genai.Client(api_key=gemini_api_key) if self.vision_enabled else None
self._genai = (
genai.Client(api_key=gemini_api_key)
if (self.vision_enabled or self.invitation_enabled)
else None
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants