Skip to content

DevOps-Nirvana/Kiro-Ception

Repository files navigation

Kiro Ception

Kiro Ception

GitHubIssuesReleases

Your AI now remembers everything you've ever done with it, across every machine you own. Finally, an elephant-grade memory for your coding assistant, minus the 12,000-pound footprint.

Kiro Ception gives Kiro a long-term memory, persistent recall that spans every session, every window, CLI and IDE, and even across multiple machines. Your agent remembers what you discussed yesterday, last month, or six months ago, in any project, on any computer you work from. It automatically indexes all conversation history in the background and provides instant hybrid search (semantic + keyword) so you can find past discussions, decisions, and implementations by meaning, keywords, date, or any combination.

"We discussed this already..." "What was that approach we used last week?" "Didn't we solve this exact problem in the other project?" "How did I usually set up CI pipelines?"

  • All things you can now just ask, and actually get an answer.

How It Works

Kiro Ception is an MCP Power that runs as a background service alongside your Kiro IDE. It:

  1. Discovers all Kiro CLI and IDE session files on your machine
  2. Extracts meaningful messages (filtering out system prompts and boilerplate, condensing long code blocks into [code:lang] placeholders)
  3. Embeds each message into a vector representation using your configured model
  4. Indexes everything into an in-memory numpy matrix for instant hybrid search (semantic + FTS5 keyword)
  5. Serves search results via MCP tools that Kiro can call naturally during conversation
  6. Federates across machines, search your laptop and desktop simultaneously with encrypted peer-to-peer queries

Sessions are processed newest first, so your most recent conversations are searchable within seconds of startup, even while older history is still being indexed in the background.

Search results include surrounding context (messages before/after each match), relevance scores, workspace origin, and pagination, so Kiro gets the full picture of what was discussed.

Architecture Highlights

  • Non-blocking: Heavy work (indexing, embedding) runs in background daemon threads. The MCP server responds instantly.
  • Hybrid search: Combines semantic vector similarity (70%) with FTS5 full-text keyword search (30%). Find things by meaning and exact names.
  • Recency-aware: Recent conversations rank higher automatically. The decay curve scales with your history depth, no manual tuning.
  • Multi-window efficient: Leader-follower pattern means multiple Kiro windows share one index in RAM. No duplication, no conflicts.
  • Multi-machine: Optional peer federation searches across all your computers simultaneously with AES-256-GCM encrypted transport.
  • Crash-safe: SQLite with WAL mode. Lose at most one in-flight message on Ctrl+C/crash/quit.
  • Instant cold-start: Loads from existing cache in under 1 second. No waiting for re-indexing after restarts.
  • Auto-migrating: Schema upgrades run automatically on startup, updates never require deleting your cache, future-proofing this tool.
  • Observable: Built-in status dashboard, indexing progress monitoring, hot-reloadable config, and health diagnostics, all accessible to the agent or via browser.

Installation

Prerequisites

  • Kiro - the AI-powered IDE
  • Git - for cloning/updating the power
  • Python 3.11+ (3.12, 3.13 also supported and tested officially)
  • uv - fast Python package manager

Install as a Kiro Power (Recommended)

  1. In Kiro IDE: Powers panel → Add power from GitHub
  2. Enter: https://github.com/DevOps-Nirvana/Kiro-Ception
  3. Click Install — the power activates automatically when you mention "recall", "remember", or "past conversation"

The power registers itself and "Check for updates" in the Powers panel will pull the latest version whenever you want.

Install from Source

git clone https://github.com/DevOps-Nirvana/Kiro-Ception.git
cd Kiro-Ception
uv sync

Then register it as a power: Powers panel → Add power from Local Path → select the Kiro-Ception folder you just cloned.

Manual MCP Setup (Alternative)

If you installed from source or prefer manual configuration, add to your Kiro MCP configuration (~/.kiro/settings/mcp.json):

Warning: Installing as a Power (above) is strongly recommended. The POWER.md file contains keyword triggers and usage guidance that help Kiro automatically activate search when you reference past conversations. With MCP-only setup, you'll need to explicitly ask Kiro to search history, it won't trigger on its own from phrases like "as we discussed" or "what did we do last time".

{
  "mcpServers": {
    "kiro-ception": {
      "command": "uv",
      "args": ["tool", "run", "--from", "git+https://github.com/DevOps-Nirvana/Kiro-Ception", "kiro-ception"]
    }
  }
}

This uses uv tool run to fetch and run the package directly from GitHub, no local clone needed.

Alternatively, if you've cloned the repo locally:

{
  "mcpServers": {
    "kiro-ception": {
      "command": "/path/to/Kiro-Ception/.venv/bin/kiro-ception"
    }
  }
}

Replace /path/to/Kiro-Ception with the actual clone location. Usually just saving your mcp will do it, but if needed, restart Kiro.

Configuration

Create ~/.config/kiro-ception/config.toml to customize behavior. If this file doesn't exist, sensible defaults are used (local CPU-based embeddings with all-MiniLM-L6-v2). Query the tool get_config for full information on your file location(s) for your config and database.

A full annotated default config is in config.default.toml; copy it as a starting point:

mkdir -p ~/.config/kiro-ception
cp config.default.toml ~/.config/kiro-ception/config.toml

Minimal Config (Zero Setup)

With no config file at all, Kiro Ception uses:

  • Backend: sentence-transformers (local, CPU-based, no API/GPU needed)
  • Model: all-MiniLM-L6-v2 (384 dimensions, ~80MB download on first run)
  • Sources: Auto-discovers Kiro CLI and IDE conversations in both old and new formats
  • Memory: Uses up to 1/3 of available RAM for the index (by default)

This is a good starting point; it runs entirely on CPU with no external dependencies.

GPU-Accelerated with Ollama (Recommended for Power Users)

If you have Ollama running with a GPU, you can use much larger, higher-quality embedding models by putting something like the following in your config file:

[embedding]
backend = "openai-compatible"
model = "qwen3-embedding:4b"
api_base = "http://localhost:11434/v1"
dimensions = 1024
batch_size = 1

Setup:

# Install Ollama (if not already): https://ollama.com
ollama pull qwen3-embedding:4b

This gives significantly better search quality than MiniLM, especially for nuanced queries. The 4b model runs comfortably on a 6GB+ GPU and indexes at ~3–5 messages/second.

OpenAI / Hosted Providers

[embedding]
backend = "openai-compatible"
model = "text-embedding-3-large"
api_base = "https://api.openai.com/v1"
api_key = "sk-..."
dimensions = 1024

LM Studio

[embedding]
backend = "openai-compatible"
model = "your-model-name"
api_base = "http://localhost:1234/v1"
dimensions = 768

MCP Tools

Kiro can call these tools naturally during conversation:

Tool Purpose
search_project_history Search conversations scoped to the current workspace
search_global_history Search across all workspaces (supports source filter: all/cli/ide)
get_indexing_status Check indexer progress, rate, errors, ETA
rescan Trigger a rescan for new sessions (full=True to re-read everything)
get_config Show effective config, paths, cache stats, instance role, etc
reload_config Hot-reload config from disk without requiring restart of Kiro

Search Parameters

Both search tools accept:

Parameter Default Description
query (required) Natural language search query
after Only messages on/after this date (ISO 8601)
before Only messages before this date (ISO 8601)
context_size 3 Messages before/after each match to include
threshold 0.2 Minimum similarity score (0–1)
max_results 10 Maximum results to return
offset 0 Skip results for pagination

Technologies & Libraries

Component Library Purpose
MCP Server mcp (FastMCP) Exposes tools to Kiro via Model Context Protocol
Embedding (local) sentence-transformers Local CPU/GPU embeddings (default: all-MiniLM-L6-v2)
Embedding (API) requests OpenAI-compatible HTTP API for Ollama/LM Studio/OpenAI
Vector Search numpy In-memory cosine similarity via dot product
Data Models Pydantic Typed data validation and serialization
Cache SQLite (stdlib) Persistent embedding + metadata storage (WAL mode)
Process Coordination filelock Leader-follower election via file locks
Encryption cryptography + argon2-cffi AES-256-GCM peer encryption with Argon2id key derivation
Build hatchling PEP 517 build backend
Package Manager uv Fast dependency resolution and venv management
Linter/Formatter ruff Linting and formatting
Tests pytest Test framework (300 tests)

Optional Features

Peer Federation

Search across multiple machines (e.g., your laptop + desktop). Each machine runs its own independent index. When you search, queries fan out to all peers in parallel and results are merged.

[peers]
enabled = true
nodes = ["192.168.1.50:19742", "workpc.tailscale:19742"]
secret = "my-shared-passphrase"  # Optional: encrypts all peer traffic with AES-256-GCM
timeout_seconds = 5

Peers communicate over HTTP. If secret is set, payloads are encrypted with AES-256-GCM (key derived via Argon2id from the passphrase). Both machines must use the same secret. Without a secret, traffic is plaintext; fine on VPNs or Tailscale or when local-only at your own house (up to you).

Memory Limits

Control how much RAM the index uses:

[memory]
fraction = 0.33     # Use up to 1/3 of RAM (default)
# limit_mb = 512    # Or set an explicit limit
# limit_mb = 0      # Disable limit (use all available)

Indexing Throttle

Reduce GPU/CPU load during active work:

[indexing]
throttle_ms = 5000   # Sleep 5000ms (5 seconds) between embedding batches (default: 0)
rescan_interval_minutes = 10  # Check for new sessions every 10 minutes (this is the default)

Once your initial index is built, it can be quite nice to add the throttle_ms value of 5-10 seconds (5000-10000) to ensure your computer runs quickly and your usage is not negatively affected. This is especially valuable if you are using a large local GPU-based model.

Secondarily, if you are trying to be sparing on battery life, and/or if you don't care about getting your index up to date so quickly, you can greatly increase the rescan interval to 60 minutes, OR you can disable this automated rescan/reindexing process by setting this to 0.

Performance

Metric Value
First-time indexing (MiniLM, CPU) ~4 minutes (4300+ sessions)
First-time indexing (Qwen3-Embedding:4b, GPU) ~35 minutes (4300+ sessions)
Subsequent startups <2 seconds
Search latency <10ms
Index refresh (backgrounded) Every 60 seconds
Periodic rescan to update indexes (backgrounded) Every 10 minutes
Embedding rate (Qwen3-Embedding:4b) ~3–5 messages/second

Indexing order: Sessions are indexed newest first, so your most recent conversations become searchable within seconds of startup. Older conversations fill in progressively in the background.

Troubleshooting

"Backend not ready" or "still loading"

On first startup, the index eagerly loads from SQLite into RAM. If embeddings exist but metadata hasn't populated yet, you'll see a "still loading" message. Retry in a few seconds. Also, as your size of your embeddings increases this may make it take a little longer. I have six months of Kiro work across 4300 chat documents with an (currently) 300MB embedding db, and it takes 10-15 seconds to load the index into RAM.

Empty search results

  • Check get_indexing_status; indexing may still be in progress
  • Use rescan() to immediately pick up recent conversations
  • Verify your config with get_config
  • Check "Kiro Powers / MCP" log

Embedding errors / timeouts

  • For Ollama: ensure it's running (ollama ps) and the model is pulled
  • Very long messages (>50K chars) may timeout; they're skipped with a warning
  • Check your "Kiro Powers" outputs for logs/errors

Config changes not taking effect

  • Use reload_config tool (applies safe changes immediately)
  • Model/backend/dimensions changes require rescan(full=True)

Multiple windows fighting

The leader-follower pattern handles this automatically. Use get_config to see which process is leader. If a leader dies, the next request attempts to auto-promote a follower.

Nuclear option

If the database is corrupt or everything is broken, find your file path to your database calling the get_config tool. Then, once you find it, uninstall this power (or disable the MCP) then remove your database, then reinstall this power (or re-enable MCP).

rm -rf ~/.cache/kiro-ception/

When you Restart Kiro (or re-enable MCP) it will rebuild the embeddings database from scratch.

Development

uv sync                         # Install deps
uv run pytest tests/ -q         # Run tests (300, ~30s)
uv run ruff check src/          # Lint
uv run kiro-ception             # Run MCP server locally

Data Locations

For information about where your data is being kept, call the MCP tool "get_config". On an unix-ey system, the file(s) at are...

Path Contents
~/.config/kiro-ception/config.toml User configuration
~/.cache/kiro-ception/cache_<hash>.db SQLite database (embeddings, metadata)
~/.cache/kiro-ception/leader.lock Leader election file lock
~/.cache/kiro-ception/leader.json Leader port/PID info for followers

The cache DB filename includes a hash of the backend configuration. Changing model/backend/dimensions creates a new DB file (old ones are preserved for rollback).

Privacy: All data is processed and stored locally on your machine. No telemetry, no external API calls, and no data leaves your device; unless you explicitly configure a third-party embedding provider (e.g., OpenAI). The default configuration uses fully local, offline embeddings.

Support

Found a bug? Have a feature request? Open an issue on GitHub.

License

MIT - See: LICENSE.

Attribution

Built by Farley Farley (DevOps-Nirvana), based upon Kiro Total Recall by Danilo Poccia (MIT licensed). The original session loaders, data models, and core embed/search concept originate from that project. Kiro Ception is a ground-up rewrite for production use; see the Architecture Highlights above for what's different.

About

Kiro MCP Power that gives Kiro long-term memory. Indexes all CLI and IDE conversations into a semantic search engine using local or API-based embeddings. Newest-first indexing, leader-follower coordination, multi-computer fan-out search, and instant cosine similarity search. This makes Kiro weird and sci-fi like...

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages