Elara — The Eye

Role in the multi-agent system: This service is the perception layer — the eye. It runs locally on a Raspberry Pi 5, watches the physical world in front of it, identifies who is present, tracks where they are, reads their emotional state, and understands the scene around them. Every meaningful observation is logged as a snapshot and must be synced to the central multi-agent database running on RunPod so all cloud agents share the same ground truth about the user.

What this is

Elara is a FastAPI microservice that does four things continuously:

Presence detection — knows when a registered user is in front of the camera
Face tracking — follows the user with a pan-tilt servo camera (via ESP32)
Identity verification — confirms who is present, not just a face
Perception analysis — reads emotion from the face, describes the scene, and reasons about causal links between environment and emotional state

The output of all four — who is present, where they are, what they feel, why — is written to a local SQLite timeline and must be mirrored into the multi-agent system's database on RunPod so every agent in the system can make decisions informed by real-world user state.

Where this fits in the multi-agent system

┌─────────────────────────────────────────────────────────────┐
│                    RunPod (Cloud)                           │
│                                                             │
│   ┌──────────┐   ┌──────────┐   ┌──────────┐              │
│   │  Agent A │   │  Agent B │   │  Agent C │  ...         │
│   └────┬─────┘   └────┬─────┘   └────┬─────┘              │
│        │              │              │                      │
│        └──────────────┴──────────────┘                     │
│                       │                                     │
│            ┌──────────▼──────────┐                         │
│            │  Central Agent DB   │  ← perception section   │
│            │  (perception table) │    written here         │
│            └──────────▲──────────┘                         │
│                       │  sync / push                        │
└───────────────────────┼─────────────────────────────────────┘
                        │
            ┌───────────┴───────────┐
            │   Raspberry Pi 5      │
            │   (this service)      │
            │                       │
            │  ┌─────────────────┐  │
            │  │   Camera + Servos│  │
            │  └────────┬────────┘  │
            │           │           │
            │  ┌────────▼────────┐  │
            │  │  Face Tracker   │  │  ← 20 fps, Haar + PID
            │  └────────┬────────┘  │
            │           │           │
            │  ┌────────▼────────┐  │
            │  │  Identity Check │  │  ← dlib, every 3s
            │  └────────┬────────┘  │
            │           │           │
            │  ┌────────▼────────┐  │
            │  │  Analyzer       │  │  ← HSEmotion + Moondream + Mistral
            │  └────────┬────────┘  │
            │           │           │
            │  ┌────────▼────────┐  │
            │  │  timeline.db    │  │  ← local SQLite (source of truth)
            │  └─────────────────┘  │
            └───────────────────────┘

Cloud agents query the perception section of the central DB to know the current state of the user before deciding what to do. Elara is the only writer to this section.

Perception data produced

Every time a meaningful change is detected (emotion shifts, scene changes, new subjects appear), a snapshot is recorded. Each snapshot contains:

Field	Type	Description
`ts`	ISO-8601 UTC	When the snapshot was taken
`user`	string	Identified user name
`emotion`	string	Dominant emotion: `happiness`, `sadness`, `anger`, `fear`, `disgust`, `contempt`, `neutral`, `surprise`
`confidence`	float 0–1	Emotion model confidence
`emotion_scores`	JSON object	Full probability distribution across all 8 emotions
`scene_description`	string	Free-form description of surroundings (objects, animals, environment)
`subjects`	JSON array	Up to 5 key entities in the scene
`affected`	bool	Whether the scene plausibly caused the emotion
`reason`	string	Causal explanation if `affected` is true
`summary`	string	One-sentence synthesis — e.g. `"user happy because dog is playing"`
`thumbnail`	BLOB	320px-wide JPEG of the frame at snapshot time

Deduplication strategy

Elara does not write a row every N seconds blindly. A new row is written only when:

Emotion label changes
affected flag flips
More than 50% of scene subjects change (Jaccard distance < 0.5)

This keeps the timeline lean — typically a few dozen rows per session, not thousands.

Integration with the multi-agent DB on RunPod

What needs to happen

The faces/timeline.db on the Pi is the local source of truth. The central DB on RunPod needs a perception_snapshots table (or equivalent) that mirrors this data. There are two integration patterns:

Option A — Push (recommended): After each timeline.maybe_record() call in monitor.py, push the new row to the RunPod DB via HTTP or direct DB connection. Low latency, no polling.

Option B — Pull/Sync: A RunPod agent periodically calls GET /timeline (to be added) and upserts rows by (user, ts) primary key. Simpler to implement, slight lag.

Schema for the perception section of the central DB

CREATE TABLE perception_snapshots (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    ts              TEXT    NOT NULL,           -- ISO-8601 UTC from Pi
    user            TEXT    NOT NULL,           -- identified user name
    emotion         TEXT    NOT NULL,
    confidence      REAL    NOT NULL,
    emotion_scores  TEXT    NOT NULL,           -- JSON string
    scene           TEXT    NOT NULL,
    subjects        TEXT    NOT NULL,           -- JSON array string
    affected        INTEGER NOT NULL,           -- 0 or 1
    reason          TEXT    NOT NULL,
    summary         TEXT    NOT NULL,
    source_device   TEXT    DEFAULT 'pi',       -- for multi-device future
    synced_at       TEXT,                       -- when it arrived in central DB
    UNIQUE(user, ts)                            -- prevent duplicates on re-sync
);

CREATE INDEX idx_perception_user_ts ON perception_snapshots(user, ts);
CREATE INDEX idx_perception_emotion ON perception_snapshots(emotion);

API endpoint for cloud agents to read current state

Add this to main.py (not yet implemented):

GET /timeline              → all users, newest first (JSON array)
GET /timeline/{user}       → one user, newest first
GET /timeline/{user}/latest → single most recent snapshot for quick polling

What cloud agents should do with this data

Each agent should read the most recent perception snapshot for the active user before generating a response or taking an action. The summary field is the fastest signal; use emotion_scores for nuance.

Example agent prelude:

User: abhi
Current state: sadness (87% confidence)
Scene: sitting at desk, cat on floor nearby
Cause: user sad because cat is on the floor
→ Adjust tone: be gentle, avoid high-energy responses

How the perception pipeline works

/frames/event  (GET, atomic JSON)
      │
      ▼
monitor.py  [polls every PERIOD_S seconds, skips if no identified user]
      │
      ├─── Stage 1: HSEmotion (EfficientNet-B0, face crop)
      │           → emotion label + confidence + 8-class scores
      │
      ├─── Stage 2: Moondream VLM (full frame)
      │           → scene description in natural language
      │
      └─── Stage 3: Mistral LLM (emotion + scene combined)
                  → structured JSON: subjects, affected flag, reason, summary
                        │
                        ▼
                  timeline.py  [write only if meaningfully changed]
                        │
                        ▼
                  faces/timeline.db  (SQLite)
                        │
                        ▼
               ── sync ──► RunPod central DB

Running the service

Requirements: Python ≥ 3.12, uv, Ollama running locally with moondream and mistral:latest

cd Face_login_Elara-master
uv sync

# Start the main server
uv run uvicorn main:app --host 0.0.0.0 --port 8765

# Start the perception monitor (separate process)
python monitor.py --period 60

# Point monitor at the Pi from another machine
PI_URL=http://<pi-ip>:8765 python monitor.py --period 30

Environment variables:

Variable	Default	Description
`PI_URL`	`http://localhost:8765`	Where monitor.py finds the camera service
`MONITOR_PERIOD_S`	`60`	Seconds between perception ticks
`OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama instance for VLM + LLM
`ANALYZER_VLM_MODEL`	`moondream`	Vision model for scene description
`ANALYZER_LLM_MODEL`	`mistral:latest`	Language model for structured reasoning
`DEVICE_MODE`	auto	`pi` or `laptop` — forces camera mode

API reference

Face database

Method	Path	Description
`GET`	`/faces`	List all registered names
`POST`	`/register`	Register a new face — `{name, images: [base64, ...]}`
`DELETE`	`/faces/{name}`	Delete a face

Authentication

Method	Path	Description
`POST`	`/login`	Match face → `{success, name, token, confidence}`. On success, tells tracker who to watch for.

Tracking — servo state

Method	Path	Description
`GET`	`/track/status`	Pan/tilt angles, PID errors, FPS
`GET`	`/track/snapshot`	Annotated JPEG with tracking overlay
`GET`	`/track/stream`	MJPEG stream at 20 fps with overlay
`POST`	`/track/feed`	Push a browser frame (browser-feed mode)

Identity (Channel 1 — low bandwidth)

Poll these every few seconds from cloud agents that only need identity, not the image.

Method	Path	Response
`GET`	`/track/person`	`{person, position, identified_person_present, identified_name, last_recognition_ts}`
`GET`	`/track/identity`	Who the tracker is watching for
`DELETE`	`/track/identity`	Clear identity; recognition worker idles

Frames (Channel 2 — for agents that need the image)

Method	Path	Description
`GET`	`/frames/event`	Atomic JSON: `{timestamp, frame:{base64,...}, person, position, box}` — use this for analyzer
`GET`	`/frames/current`	Raw JPEG + identity in response headers — bandwidth-efficient
`GET`	`/frames/stream`	Continuous MJPEG, no overlay

/frames/event response shape:

{
  "timestamp": 1715000000.123,
  "frame": {"base64": "...", "width": 640, "height": 480},
  "person": "abhi",
  "identified_name": "abhi",
  "identified_person_present": true,
  "position": {"x": 312, "y": 240},
  "box": {"top": 180, "right": 380, "bottom": 300, "left": 250}
}

Timeline (to be added for cloud sync)

Method	Path	Description
`GET`	`/timeline`	All snapshots, newest first
`GET`	`/timeline/{user}`	Snapshots for one user
`GET`	`/timeline/{user}/latest`	Single most recent snapshot

Misc

Method	Path	Description
`GET`	`/`	Login / registration web UI
`GET`	`/track`	Live tracking monitor page
`GET`	`/health`	`{"status": "ok"}`
`GET`	`/config`	Mode, is_pi flag, camera source

Hardware

Raspberry Pi 5 — runs this service, camera, and Ollama
Pi Camera Module — captured via libcamera / picamera2
Pan-tilt servo mount — two hobby servos (pan + tilt)
ESP32 — receives servo commands over USB serial (115200 baud), drives PWM
Laptop mode — works without Pi or servos; browser sends frames via POST

Tracking tuning (config.py)

Setting	Default	Effect
`DEAD_ZONE_PX`	`30`	Pixels off-centre before servo moves
`PID_KP`	`0.012`	Proportional gain — higher = faster
`PID_KD`	`0.003`	Derivative gain — higher = less oscillation
`SLEW_MAX_DEG`	`2.0`	Max servo jump per tick
`TARGET_FPS`	`20`	Tracker loop rate
`IDENTITY_CHECK_INTERVAL_S`	`3.0`	How often dlib re-confirms identity
`IDENTITY_TOLERANCE`	`0.50`	Face match strictness (lower = stricter)

Project structure

Face_login_Elara-master/
├── main.py              # FastAPI app — all HTTP endpoints
├── tracker.py           # CameraManager singleton + FaceTracker threads
├── config.py            # All tunable parameters
├── pid.py               # Discrete PID with anti-windup
├── servo.py             # ESP32 serial abstraction + simulation mode
├── analyzer.py          # 3-stage perception pipeline (HSEmotion + Moondream + Mistral)
├── monitor.py           # Polls /frames/event, runs analyzer, stores timeline
├── timeline.py          # SQLite persistence with smart deduplication
├── webcam_injector.py   # Feed laptop webcam frames to server
├── diagnose.py          # Camera and detection diagnostics
├── faces/
│   ├── db.json          # Face encoding store
│   └── timeline.db      # Perception snapshot history
├── templates/
│   ├── index.html       # Login / registration page
│   └── track.html       # Live tracking monitor
└── esp32_servo/
    └── src/main.cpp     # ESP32 firmware — serial parser + PWM servo

Graceful degradation

Missing component	Behaviour
No ESP32	Servo commands simulated (`[SIM]` shown in stream overlay)
No Pi camera	Browser-feed mode — login and tracking still work via web UI
Ollama not running	Analyzer returns empty/neutral; timeline still records presence
Tracker fails to start	`/track/` and `/frames/` return 503; login/register still work

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Elara — The Eye

What this is

Where this fits in the multi-agent system

Perception data produced

Deduplication strategy

Integration with the multi-agent DB on RunPod

What needs to happen

Schema for the perception section of the central DB

API endpoint for cloud agents to read current state

What cloud agents should do with this data

How the perception pipeline works

Running the service

API reference

Face database

Authentication

Tracking — servo state

Identity (Channel 1 — low bandwidth)

Frames (Channel 2 — for agents that need the image)

Timeline (to be added for cloud sync)

Misc

Hardware

Tracking tuning (config.py)

Project structure

Graceful degradation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
faces		faces
may_10th		may_10th
may_10th_eyes		may_10th_eyes
temp		temp
templates		templates
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
Working.md		Working.md
analyzer.py		analyzer.py
config.py		config.py
diagnose.py		diagnose.py
main.py		main.py
monitor.py		monitor.py
pid.py		pid.py
pyproject.toml		pyproject.toml
servo.py		servo.py
servo_test.py		servo_test.py
test_webcam.py		test_webcam.py
timeline.py		timeline.py
tracker.py		tracker.py
uv.lock		uv.lock
webcam_injector.py		webcam_injector.py

Folders and files

Latest commit

History

Repository files navigation

Elara — The Eye

What this is

Where this fits in the multi-agent system

Perception data produced

Deduplication strategy

Integration with the multi-agent DB on RunPod

What needs to happen

Schema for the perception section of the central DB

API endpoint for cloud agents to read current state

What cloud agents should do with this data

How the perception pipeline works

Running the service

API reference

Face database

Authentication

Tracking — servo state

Identity (Channel 1 — low bandwidth)

Frames (Channel 2 — for agents that need the image)

Timeline (to be added for cloud sync)

Misc

Hardware

Tracking tuning (config.py)

Project structure

Graceful degradation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages