AI-driven real-time review clustering system for streaming customer feedback.
- Ingests reviews via HTTP.
- Embeds text with BGE small (384d).
- Reduces to 5d with Parametric UMAP.
- Clusters online with River DBSTREAM.
- Names clusters with an optional LLM.
- Streams results to Postgres, Redis, and a live Dash dashboard.
FastAPI -> Kafka -> Embedder -> Clusterer -> Namer -> Postgres + Redis -> Dash
- Create a virtual env and install deps:
uv venv source .venv/bin/activate uv pip install -e ".[dev,test]"
- Configure
config.yaml(or useconfig.docker.yamlwith Docker). - Start infrastructure (recommended):
docker compose up --build
- Ingest a review:
curl -X POST http://localhost:8000/reviews \ -H "Content-Type: application/json" \ -d '{"text": "Love the new UI"}'
- Open the dashboard at http://localhost:8050.
uv run python -m sentistream.services.embedder_svc
uv run python -m sentistream.services.clusterer_svc
uv run python -m sentistream.services.namer_svc
uv run python -m sentistream.dashboard.app
uv run uvicorn sentistream.ingestion.api:app --host 0.0.0.0 --port 8000uv run pytestEnable live LLM test only when you have a valid key:
export SENTISTREAM_LLM_TESTS=1
export LITELLM_API_KEY=your_key_here
export SENTISTREAM_LLM_MODEL=gpt-4o-mini # optional
uv run pytest tests/test_namer.pyKey settings live in config.yaml:
llm.model,llm.api_keykafka.bootstrap_servers,kafka.topicsdatabase.postgres_dsn,database.redis_urlml.embedder_onnx_dir,ml.umap_onnx_path,ml.hf_repo_id
- Models are stored under
models/and can be auto-downloaded from HF when configured. - The naming service is optional; when disabled it falls back to a placeholder name.