Problem
When source.type: local points at a directory containing .md or .txt files, the pipeline silently skips them — only .html, .pdf, .png, and .jpg are handled. Users have to manually convert markdown to HTML before indexing.
Reproduction
mkdir docs && echo "# Hello World" > docs/readme.md
cat > pixelrag.yaml <<EOF
source:
type: local
path: ./docs
embed:
model: Qwen/Qwen3-VL-Embedding-2B
device: auto
output: ./index
EOF
pixelrag index build
# Stage 1: Rendering 0 documents to tiles...
# (readme.md was silently ignored)
Proposed fix
In index/src/pixelrag_index/sources/local.py, add .md and .txt to the supported extensions. In pipelines.py, add a text_docs category that:
- Wraps the text/markdown content in a minimal styled HTML template
- Renders via the CDP backend (like URL docs)
This is lightweight — just wrapping content in <html><body><pre> for .txt or a basic markdown→HTML conversion (the markdown package or even regex-based) for .md.
Related
Problem
When
source.type: localpoints at a directory containing.mdor.txtfiles, the pipeline silently skips them — only.html,.pdf,.png, and.jpgare handled. Users have to manually convert markdown to HTML before indexing.Reproduction
Proposed fix
In
index/src/pixelrag_index/sources/local.py, add.mdand.txtto the supported extensions. Inpipelines.py, add atext_docscategory that:This is lightweight — just wrapping content in
<html><body><pre>for .txt or a basic markdown→HTML conversion (themarkdownpackage or even regex-based) for .md.Related