Qdrant Code Indexing for Kilo Code

This repository contains Docker Compose configuration and PowerShell scripts to set up local code indexing for Kilo Code using Qdrant vector database and HuggingFace embeddings.

Overview

Kilo Code supports semantic code search and context-aware assistance through vector embeddings. This setup provides:

Qdrant: Vector database for storing code embeddings
Text Embeddings Inference: HuggingFace GPU-accelerated service for generating embeddings from code
Optimized for Performance: Uses GPU acceleration for fast code indexing

Architecture

graph LR
    A[Kilo Code] --> B[Embeddings Service]
    A --> C[Qdrant Vector DB]
    B --> D[HuggingFace Model]
    D --> E[jinaai/jina-embeddings-v2-base-code]

Prerequisites

Docker Desktop with Docker Compose support
Windows 11 with PowerShell
NVIDIA GPU with CUDA support
NVIDIA Container Toolkit (for GPU access in Docker)

Quick Start

1. Start Services

Run the start script to launch both Qdrant and the GPU embeddings service:

.\start_code_indexing.ps1

This will start Docker Compose with the GPU profile for fast, GPU-accelerated code indexing.

2. Configure Kilo Code

You can configure indexing in Kilo Code by clicking here:

Set the endpoint values like this (text values below):

Enable Codebase Indexing: checked
Embedder Provider: OpenAI Compatible
Base URL: http://localhost:8080
API Key: DUMMY_KEY
Model: jinaai/jina-embeddings-v2-base-code
Model Dimension: 768
Vector Store Provider: Qdrant
Qdrant URL: http://localhost:6333
Qdrant API Key: (empty)

Also set the advanced settings:

Once you have done this, click Save at the bottom. This will now enable a Start Indexing button to the left. Click that and you should see that indexing has started:

Important Notes:

Both services must be running (via start_code_indexing.ps1) for code indexing to work
Use localhost (not 127.0.0.1 or 0.0.0.0) for compatibility
Port numbers (6333 and 8080) match the exposed ports in compose.yml
If you see connection errors, verify containers are running: docker ps
For the jinaai/jina-embeddings-v2-base-code model Embedding Batch Size cannot exceed 30.

3. Stop Services

When you're done or need to restart:

.\stop_code_indexing.ps1

This will cleanly shut down all containers and remove any orphaned containers.

Services Configuration

Qdrant Vector Database

Ports:
- 6333: HTTP API
- 6334: gRPC API
Data Persistence: Volume qdrant_data stores indexed code
Container Name: qdrant

Embeddings Service (GPU)

Image: ghcr.io/huggingface/text-embeddings-inference:1.8
Port: 8080
Model: jinaai/jina-embeddings-v2-base-code
GPU Support: Requires NVIDIA GPU with CUDA
Cache: Volume hf_cache stores downloaded models
Container Name: embeddings

Supported GPU Architectures

The GPU image (:1.8) supports:

Ampere 80 (A100, A30)
Ampere 86 (A10, A40) - use :86-1.8
Ada Lovelace (RTX 4000 series) - use :89-1.8
Turing (T4, RTX 2000 series) - use :turing-1.8
Hopper (H100) - use :hopper-1.8

To use a different GPU architecture, edit compose.yml line 15.

Adding CPU Support (Optional)

If you need to run on systems without a compatible GPU, you can add CPU support:

1. Create CPU Build Context

Create a directory ./embeddings-cpu with a Dockerfile for CPU-based embeddings:

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
RUN pip install sentence-transformers fastapi uvicorn torch --extra-index-url https://download.pytorch.org/whl/cpu

# Copy application code
COPY app.py .

# Expose port
EXPOSE 80

# Run the application
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "80"]

Create app.py to serve embeddings:

from fastapi import FastAPI
from sentence_transformers import SentenceTransformer
from pydantic import BaseModel

app = FastAPI()
model = SentenceTransformer('jinaai/jina-embeddings-v2-base-code')

class EmbeddingRequest(BaseModel):
    inputs: str

@app.post("/embed")
async def embed(request: EmbeddingRequest):
    embeddings = model.encode(request.inputs)
    return {"embeddings": embeddings.tolist()}

2. Create CPU Start Script

Create start_cpu_code_indexing.ps1:

# Change to the script's directory where compose.yml is located
Set-Location $PSScriptRoot

# Start Docker Compose with CPU profile
docker compose --profile cpu up -d

3. Switch Between Profiles

GPU: .\start_code_indexing.ps1
CPU: .\start_cpu_code_indexing.ps1

Note: The CPU profile is significantly slower but works on all systems.

Scripts Reference

start_code_indexing.ps1

Main startup script that starts Docker Compose with the GPU profile:

Changes to script directory
Runs docker compose --profile gpu up -d
Starts Qdrant + GPU embeddings service

stop_code_indexing.ps1

Stops all services and cleans up:

Runs docker compose down --remove-orphans
Force removes any remaining containers
Keeps data volumes intact

Troubleshooting

Container Name Conflict

Error: The container name "/embeddings" is already in use

Solution: Run the stop script to clean up:

.\stop_code_indexing.ps1

Image Pull Failed

Error: manifest unknown

Solution: Ensure you're using the correct image tag. The current configuration uses :1.8 which is the latest stable version.

GPU Not Detected

Error: GPU services fail to start or container crashes

Solution:

Verify NVIDIA drivers are installed: nvidia-smi
Install NVIDIA Container Toolkit
Restart Docker Desktop
Consider adding CPU support if GPU is unavailable (see "Adding CPU Support" section)

Connection Refused in Kilo Code

Symptom: Kilo Code cannot connect to Qdrant or Embeddings service

Solution:

Verify containers are running: docker ps
Check container logs: docker logs qdrant or docker logs embeddings
Ensure ports 6333 and 8080 are not in use by other applications
Verify URLs in Kilo Code settings match http://localhost:6333 and http://localhost:8080

Data Persistence

Data is persisted in Docker volumes:

qdrant_data: Stores all indexed code vectors
hf_cache: Stores downloaded HuggingFace models

To completely remove all data:

docker volume rm kilo-code-local_qdrant_data
docker volume rm kilo-code-local_hf_cache

Performance Considerations

GPU (Current Setup)

Pros: 3-10x faster indexing, real-time performance for large codebases
Cons: Requires NVIDIA GPU, higher memory usage
Recommended For: All users with compatible NVIDIA GPU

CPU (Optional Addition)

Pros: Works on any system, lower resource requirements
Cons: Significantly slower indexing, may struggle with large codebases
Recommended For: Systems without GPU, occasional indexing, small projects

To add CPU support, see the "Adding CPU Support" section above.

Advanced Configuration

Change Embedding Model

Edit compose.yml line 31 to use a different model:

command: --model-id <model-name>

Popular alternatives:

sentence-transformers/all-MiniLM-L6-v2 (smaller, faster)
BAAI/bge-base-en-v1.5 (general purpose)
jinaai/jina-embeddings-v2-base-code (optimized for code - default)

Expose Services to Network

To access from other machines, edit compose.yml ports:

ports:
  - "0.0.0.0:6333:6333"  # Qdrant
  - "0.0.0.0:8080:80"    # Embeddings

Warning: Only do this on trusted networks.

Resources

License

This configuration is provided as-is for use with Kilo Code.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
README.md		README.md
compose.yml		compose.yml
start_code_indexing.ps1		start_code_indexing.ps1
stop_code_indexing.ps1		stop_code_indexing.ps1

Folders and files

Latest commit

History

Repository files navigation

Qdrant Code Indexing for Kilo Code

Overview

Architecture

Prerequisites

Quick Start

1. Start Services

2. Configure Kilo Code

3. Stop Services

Services Configuration

Qdrant Vector Database

Embeddings Service (GPU)

Supported GPU Architectures

Adding CPU Support (Optional)

1. Create CPU Build Context

2. Create CPU Start Script

3. Switch Between Profiles

Scripts Reference

start_code_indexing.ps1

stop_code_indexing.ps1

Troubleshooting

Container Name Conflict

Image Pull Failed

GPU Not Detected

Connection Refused in Kilo Code

Data Persistence

Performance Considerations

GPU (Current Setup)

CPU (Optional Addition)

Advanced Configuration

Change Embedding Model

Expose Services to Network

Resources

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages