Kiro Ception

Your AI now remembers everything you've ever done with it, across every machine you own. Finally, an elephant-grade memory for your coding assistant, minus the 12,000-pound footprint.

Kiro Ception gives Kiro a long-term memory, persistent recall that spans every session, every window, CLI and IDE, and even across multiple machines. Your agent remembers what you discussed yesterday, last month, or six months ago, in any project, on any computer you work from. It automatically indexes all conversation history in the background and provides instant hybrid search (semantic + keyword) so you can find past discussions, decisions, and implementations by meaning, keywords, date, or any combination.

"We discussed this already..." "What was that approach we used last week?" "Didn't we solve this exact problem in the other project?" "How did I usually set up CI pipelines?"

All things you can now just ask, and actually get an answer.

How It Works

Kiro Ception is an MCP Power that runs as a background service alongside your Kiro IDE. It:

Discovers all Kiro CLI and IDE session files on your machine
Extracts meaningful messages (filtering out system prompts and boilerplate, condensing long code blocks into [code:lang] placeholders)
Embeds each message into a vector representation using your configured model
Indexes everything into an in-memory numpy matrix for instant hybrid search (semantic + FTS5 keyword)
Serves search results via MCP tools that Kiro can call naturally during conversation
Federates across machines, search your laptop and desktop simultaneously with encrypted peer-to-peer queries

Sessions are processed newest first, so your most recent conversations are searchable within seconds of startup, even while older history is still being indexed in the background.

Search results include surrounding context (messages before/after each match), relevance scores, workspace origin, and pagination, so Kiro gets the full picture of what was discussed.

Architecture Highlights

Non-blocking: Heavy work (indexing, embedding) runs in background daemon threads. The MCP server responds instantly.
Hybrid search: Combines semantic vector similarity (70%) with FTS5 full-text keyword search (30%). Find things by meaning and exact names.
Recency-aware: Recent conversations rank higher automatically. The decay curve scales with your history depth, no manual tuning.
Multi-window efficient: Leader-follower pattern means multiple Kiro windows share one index in RAM. No duplication, no conflicts.
Multi-machine: Optional peer federation searches across all your computers simultaneously with AES-256-GCM encrypted transport.
Crash-safe: SQLite with WAL mode. Lose at most one in-flight message on Ctrl+C/crash/quit.
Instant cold-start: Loads from existing cache in under 1 second. No waiting for re-indexing after restarts.
Auto-migrating: Schema upgrades run automatically on startup, updates never require deleting your cache, future-proofing this tool.
Observable: Built-in status dashboard, indexing progress monitoring, hot-reloadable config, and health diagnostics, all accessible to the agent or via browser.

Installation

Prerequisites

Kiro - the AI-powered IDE
Git - for cloning/updating the power
Python 3.11+ (3.12, 3.13 also supported and tested officially)
uv - fast Python package manager

Install as a Kiro Power (Recommended)

In Kiro IDE: Powers panel → Add power from GitHub
Enter: https://github.com/DevOps-Nirvana/Kiro-Ception
Click Install — the power activates automatically when you mention "recall", "remember", or "past conversation"

The power registers itself and "Check for updates" in the Powers panel will pull the latest version whenever you want.

Install from Source

git clone https://github.com/DevOps-Nirvana/Kiro-Ception.git
cd Kiro-Ception
uv sync

Then register it as a power: Powers panel → Add power from Local Path → select the Kiro-Ception folder you just cloned.

Manual MCP Setup (Alternative)

If you installed from source or prefer manual configuration, add to your Kiro MCP configuration (~/.kiro/settings/mcp.json):

Warning: Installing as a Power (above) is strongly recommended. The POWER.md file contains keyword triggers and usage guidance that help Kiro automatically activate search when you reference past conversations. With MCP-only setup, you'll need to explicitly ask Kiro to search history, it won't trigger on its own from phrases like "as we discussed" or "what did we do last time".

{
  "mcpServers": {
    "kiro-ception": {
      "command": "uv",
      "args": ["tool", "run", "--from", "git+https://github.com/DevOps-Nirvana/Kiro-Ception", "kiro-ception"]
    }
  }
}

This uses uv tool run to fetch and run the package directly from GitHub, no local clone needed.

Alternatively, if you've cloned the repo locally:

{
  "mcpServers": {
    "kiro-ception": {
      "command": "/path/to/Kiro-Ception/.venv/bin/kiro-ception"
    }
  }
}

Replace /path/to/Kiro-Ception with the actual clone location. Usually just saving your mcp will do it, but if needed, restart Kiro.

Configuration

Create ~/.config/kiro-ception/config.toml to customize behavior. If this file doesn't exist, sensible defaults are used (local CPU-based embeddings with all-MiniLM-L6-v2). Query the tool get_config for full information on your file location(s) for your config and database.

A full annotated default config is in config.default.toml; copy it as a starting point:

mkdir -p ~/.config/kiro-ception
cp config.default.toml ~/.config/kiro-ception/config.toml

Minimal Config (Zero Setup)

With no config file at all, Kiro Ception uses:

Backend: sentence-transformers (local, CPU-based, no API/GPU needed)
Model: all-MiniLM-L6-v2 (384 dimensions, ~80MB download on first run)
Sources: Auto-discovers Kiro CLI and IDE conversations in both old and new formats
Memory: Uses up to 1/3 of available RAM for the index (by default)

This is a good starting point; it runs entirely on CPU with no external dependencies.

GPU-Accelerated with Ollama (Recommended for Power Users)

If you have Ollama running with a GPU, you can use much larger, higher-quality embedding models by putting something like the following in your config file:

[embedding]
backend = "openai-compatible"
model = "qwen3-embedding:4b"
api_base = "http://localhost:11434/v1"
dimensions = 1024
batch_size = 1

Setup:

# Install Ollama (if not already): https://ollama.com
ollama pull qwen3-embedding:4b

This gives significantly better search quality than MiniLM, especially for nuanced queries. The 4b model runs comfortably on a 6GB+ GPU and indexes at ~3–5 messages/second.

OpenAI / Hosted Providers

[embedding]
backend = "openai-compatible"
model = "text-embedding-3-large"
api_base = "https://api.openai.com/v1"
api_key = "sk-..."
dimensions = 1024

LM Studio

[embedding]
backend = "openai-compatible"
model = "your-model-name"
api_base = "http://localhost:1234/v1"
dimensions = 768

MCP Tools

Kiro can call these tools naturally during conversation:

Tool	Purpose
`search_project_history`	Search conversations scoped to the current workspace
`search_global_history`	Search across all workspaces (supports `source` filter: all/cli/ide)
`get_indexing_status`	Check indexer progress, rate, errors, ETA
`rescan`	Trigger a rescan for new sessions (`full=True` to re-read everything)
`get_config`	Show effective config, paths, cache stats, instance role, etc
`reload_config`	Hot-reload config from disk without requiring restart of Kiro

Search Parameters

Both search tools accept:

Parameter	Default	Description
`query`	(required)	Natural language search query
`after`	—	Only messages on/after this date (ISO 8601)
`before`	—	Only messages before this date (ISO 8601)
`context_size`	3	Messages before/after each match to include
`threshold`	0.2	Minimum similarity score (0–1)
`max_results`	10	Maximum results to return
`offset`	0	Skip results for pagination

Technologies & Libraries

Component	Library	Purpose
MCP Server	mcp (FastMCP)	Exposes tools to Kiro via Model Context Protocol
Embedding (local)	sentence-transformers	Local CPU/GPU embeddings (default: all-MiniLM-L6-v2)
Embedding (API)	requests	OpenAI-compatible HTTP API for Ollama/LM Studio/OpenAI
Vector Search	numpy	In-memory cosine similarity via dot product
Data Models	Pydantic	Typed data validation and serialization
Cache	SQLite (stdlib)	Persistent embedding + metadata storage (WAL mode)
Process Coordination	filelock	Leader-follower election via file locks
Encryption	cryptography + argon2-cffi	AES-256-GCM peer encryption with Argon2id key derivation
Build	hatchling	PEP 517 build backend
Package Manager	uv	Fast dependency resolution and venv management
Linter/Formatter	ruff	Linting and formatting
Tests	pytest	Test framework (300 tests)

Optional Features

Peer Federation

Search across multiple machines (e.g., your laptop + desktop). Each machine runs its own independent index. When you search, queries fan out to all peers in parallel and results are merged.

[peers]
enabled = true
nodes = ["192.168.1.50:19742", "workpc.tailscale:19742"]
secret = "my-shared-passphrase"  # Optional: encrypts all peer traffic with AES-256-GCM
timeout_seconds = 5

Peers communicate over HTTP. If secret is set, payloads are encrypted with AES-256-GCM (key derived via Argon2id from the passphrase). Both machines must use the same secret. Without a secret, traffic is plaintext; fine on VPNs or Tailscale or when local-only at your own house (up to you).

Memory Limits

Control how much RAM the index uses:

[memory]
fraction = 0.33     # Use up to 1/3 of RAM (default)
# limit_mb = 512    # Or set an explicit limit
# limit_mb = 0      # Disable limit (use all available)

Indexing Throttle

Reduce GPU/CPU load during active work:

[indexing]
throttle_ms = 5000   # Sleep 5000ms (5 seconds) between embedding batches (default: 0)
rescan_interval_minutes = 10  # Check for new sessions every 10 minutes (this is the default)

Once your initial index is built, it can be quite nice to add the throttle_ms value of 5-10 seconds (5000-10000) to ensure your computer runs quickly and your usage is not negatively affected. This is especially valuable if you are using a large local GPU-based model.

Secondarily, if you are trying to be sparing on battery life, and/or if you don't care about getting your index up to date so quickly, you can greatly increase the rescan interval to 60 minutes, OR you can disable this automated rescan/reindexing process by setting this to 0.

Performance

Metric	Value
First-time indexing (MiniLM, CPU)	~4 minutes (4300+ sessions)
First-time indexing (Qwen3-Embedding:4b, GPU)	~35 minutes (4300+ sessions)
Subsequent startups	<2 seconds
Search latency	<10ms
Index refresh (backgrounded)	Every 60 seconds
Periodic rescan to update indexes (backgrounded)	Every 10 minutes
Embedding rate (Qwen3-Embedding:4b)	~3–5 messages/second

Indexing order: Sessions are indexed newest first, so your most recent conversations become searchable within seconds of startup. Older conversations fill in progressively in the background.

Troubleshooting

"Backend not ready" or "still loading"

On first startup, the index eagerly loads from SQLite into RAM. If embeddings exist but metadata hasn't populated yet, you'll see a "still loading" message. Retry in a few seconds. Also, as your size of your embeddings increases this may make it take a little longer. I have six months of Kiro work across 4300 chat documents with an (currently) 300MB embedding db, and it takes 10-15 seconds to load the index into RAM.

Empty search results

Check get_indexing_status; indexing may still be in progress
Use rescan() to immediately pick up recent conversations
Verify your config with get_config
Check "Kiro Powers / MCP" log

Embedding errors / timeouts

For Ollama: ensure it's running (ollama ps) and the model is pulled
Very long messages (>50K chars) may timeout; they're skipped with a warning
Check your "Kiro Powers" outputs for logs/errors

Config changes not taking effect

Use reload_config tool (applies safe changes immediately)
Model/backend/dimensions changes require rescan(full=True)

Multiple windows fighting

The leader-follower pattern handles this automatically. Use get_config to see which process is leader. If a leader dies, the next request attempts to auto-promote a follower.

Nuclear option

If the database is corrupt or everything is broken, find your file path to your database calling the get_config tool. Then, once you find it, uninstall this power (or disable the MCP) then remove your database, then reinstall this power (or re-enable MCP).

rm -rf ~/.cache/kiro-ception/

When you Restart Kiro (or re-enable MCP) it will rebuild the embeddings database from scratch.

Development

uv sync                         # Install deps
uv run pytest tests/ -q         # Run tests (300, ~30s)
uv run ruff check src/          # Lint
uv run kiro-ception             # Run MCP server locally

Data Locations

For information about where your data is being kept, call the MCP tool "get_config". On an unix-ey system, the file(s) at are...

Path	Contents
`~/.config/kiro-ception/config.toml`	User configuration
`~/.cache/kiro-ception/cache_<hash>.db`	SQLite database (embeddings, metadata)
`~/.cache/kiro-ception/leader.lock`	Leader election file lock
`~/.cache/kiro-ception/leader.json`	Leader port/PID info for followers

The cache DB filename includes a hash of the backend configuration. Changing model/backend/dimensions creates a new DB file (old ones are preserved for rollback).

Privacy: All data is processed and stored locally on your machine. No telemetry, no external API calls, and no data leaves your device; unless you explicitly configure a third-party embedding provider (e.g., OpenAI). The default configuration uses fully local, offline embeddings.

Support

Found a bug? Have a feature request? Open an issue on GitHub.

License

MIT - See: LICENSE.

Attribution

Built by Farley Farley (DevOps-Nirvana), based upon Kiro Total Recall by Danilo Poccia (MIT licensed). The original session loaders, data models, and core embed/search concept originate from that project. Kiro Ception is a ground-up rewrite for production use; see the Architecture Highlights above for what's different.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.kiro/steering		.kiro/steering
docs/images		docs/images
src/kiro_ception		src/kiro_ception
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
POWER.md		POWER.md
README.md		README.md
config.default.toml		config.default.toml
logo.png		logo.png
mcp.json		mcp.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Kiro Ception

How It Works

Architecture Highlights

Installation

Prerequisites

Install as a Kiro Power (Recommended)

Install from Source

Manual MCP Setup (Alternative)

Configuration

Minimal Config (Zero Setup)

GPU-Accelerated with Ollama (Recommended for Power Users)

OpenAI / Hosted Providers

LM Studio

MCP Tools

Search Parameters

Technologies & Libraries

Optional Features

Peer Federation

Memory Limits

Indexing Throttle

Performance

Troubleshooting

"Backend not ready" or "still loading"

Empty search results

Embedding errors / timeouts

Config changes not taking effect

Multiple windows fighting

Nuclear option

Development

Data Locations

Support

License

Attribution

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages