This repository contains Docker Compose configuration and PowerShell scripts to set up local code indexing for Kilo Code using Qdrant vector database and HuggingFace embeddings.
Kilo Code supports semantic code search and context-aware assistance through vector embeddings. This setup provides:
- Qdrant: Vector database for storing code embeddings
- Text Embeddings Inference: HuggingFace GPU-accelerated service for generating embeddings from code
- Optimized for Performance: Uses GPU acceleration for fast code indexing
graph LR
A[Kilo Code] --> B[Embeddings Service]
A --> C[Qdrant Vector DB]
B --> D[HuggingFace Model]
D --> E[jinaai/jina-embeddings-v2-base-code]
- Docker Desktop with Docker Compose support
- Windows 11 with PowerShell
- NVIDIA GPU with CUDA support
- NVIDIA Container Toolkit (for GPU access in Docker)
Run the start script to launch both Qdrant and the GPU embeddings service:
.\start_code_indexing.ps1This will start Docker Compose with the GPU profile for fast, GPU-accelerated code indexing.
You can configure indexing in Kilo Code by clicking here:
Set the endpoint values like this (text values below):
- Enable Codebase Indexing: checked
- Embedder Provider: OpenAI Compatible
- Base URL:
http://localhost:8080 - API Key:
DUMMY_KEY - Model:
jinaai/jina-embeddings-v2-base-code - Model Dimension:
768 - Vector Store Provider: Qdrant
- Qdrant URL:
http://localhost:6333 - Qdrant API Key: (empty)
Also set the advanced settings:
Once you have done this, click Save at the bottom. This will now enable a Start Indexing button to the left. Click that and you should see that indexing has started:
Important Notes:
- Both services must be running (via
start_code_indexing.ps1) for code indexing to work - Use
localhost(not127.0.0.1or0.0.0.0) for compatibility - Port numbers (6333 and 8080) match the exposed ports in
compose.yml - If you see connection errors, verify containers are running:
docker ps - For the
jinaai/jina-embeddings-v2-base-codemodel Embedding Batch Size cannot exceed 30.
When you're done or need to restart:
.\stop_code_indexing.ps1This will cleanly shut down all containers and remove any orphaned containers.
- Ports:
6333: HTTP API6334: gRPC API
- Data Persistence: Volume
qdrant_datastores indexed code - Container Name:
qdrant
- Image:
ghcr.io/huggingface/text-embeddings-inference:1.8 - Port:
8080 - Model:
jinaai/jina-embeddings-v2-base-code - GPU Support: Requires NVIDIA GPU with CUDA
- Cache: Volume
hf_cachestores downloaded models - Container Name:
embeddings
The GPU image (:1.8) supports:
- Ampere 80 (A100, A30)
- Ampere 86 (A10, A40) - use
:86-1.8 - Ada Lovelace (RTX 4000 series) - use
:89-1.8 - Turing (T4, RTX 2000 series) - use
:turing-1.8 - Hopper (H100) - use
:hopper-1.8
To use a different GPU architecture, edit compose.yml line 15.
If you need to run on systems without a compatible GPU, you can add CPU support:
Create a directory ./embeddings-cpu with a Dockerfile for CPU-based embeddings:
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
RUN pip install sentence-transformers fastapi uvicorn torch --extra-index-url https://download.pytorch.org/whl/cpu
# Copy application code
COPY app.py .
# Expose port
EXPOSE 80
# Run the application
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "80"]Create app.py to serve embeddings:
from fastapi import FastAPI
from sentence_transformers import SentenceTransformer
from pydantic import BaseModel
app = FastAPI()
model = SentenceTransformer('jinaai/jina-embeddings-v2-base-code')
class EmbeddingRequest(BaseModel):
inputs: str
@app.post("/embed")
async def embed(request: EmbeddingRequest):
embeddings = model.encode(request.inputs)
return {"embeddings": embeddings.tolist()}Create start_cpu_code_indexing.ps1:
# Change to the script's directory where compose.yml is located
Set-Location $PSScriptRoot
# Start Docker Compose with CPU profile
docker compose --profile cpu up -d- GPU:
.\start_code_indexing.ps1 - CPU:
.\start_cpu_code_indexing.ps1
Note: The CPU profile is significantly slower but works on all systems.
Main startup script that starts Docker Compose with the GPU profile:
- Changes to script directory
- Runs
docker compose --profile gpu up -d - Starts Qdrant + GPU embeddings service
Stops all services and cleans up:
- Runs
docker compose down --remove-orphans - Force removes any remaining containers
- Keeps data volumes intact
Error: The container name "/embeddings" is already in use
Solution: Run the stop script to clean up:
.\stop_code_indexing.ps1Error: manifest unknown
Solution: Ensure you're using the correct image tag. The current configuration uses :1.8 which is the latest stable version.
Error: GPU services fail to start or container crashes
Solution:
- Verify NVIDIA drivers are installed:
nvidia-smi - Install NVIDIA Container Toolkit
- Restart Docker Desktop
- Consider adding CPU support if GPU is unavailable (see "Adding CPU Support" section)
Symptom: Kilo Code cannot connect to Qdrant or Embeddings service
Solution:
- Verify containers are running:
docker ps - Check container logs:
docker logs qdrantordocker logs embeddings - Ensure ports 6333 and 8080 are not in use by other applications
- Verify URLs in Kilo Code settings match
http://localhost:6333andhttp://localhost:8080
Data is persisted in Docker volumes:
- qdrant_data: Stores all indexed code vectors
- hf_cache: Stores downloaded HuggingFace models
To completely remove all data:
docker volume rm kilo-code-local_qdrant_data
docker volume rm kilo-code-local_hf_cache- Pros: 3-10x faster indexing, real-time performance for large codebases
- Cons: Requires NVIDIA GPU, higher memory usage
- Recommended For: All users with compatible NVIDIA GPU
- Pros: Works on any system, lower resource requirements
- Cons: Significantly slower indexing, may struggle with large codebases
- Recommended For: Systems without GPU, occasional indexing, small projects
To add CPU support, see the "Adding CPU Support" section above.
Edit compose.yml line 31 to use a different model:
command: --model-id <model-name>Popular alternatives:
sentence-transformers/all-MiniLM-L6-v2(smaller, faster)BAAI/bge-base-en-v1.5(general purpose)jinaai/jina-embeddings-v2-base-code(optimized for code - default)
To access from other machines, edit compose.yml ports:
ports:
- "0.0.0.0:6333:6333" # Qdrant
- "0.0.0.0:8080:80" # EmbeddingsWarning: Only do this on trusted networks.
This configuration is provided as-is for use with Kilo Code.



