Skip to content

[Feature] Support .md and .txt files natively in the index pipeline #100

Description

@aafaq-rashid-comprinno

Problem

When source.type: local points at a directory containing .md or .txt files, the pipeline silently skips them — only .html, .pdf, .png, and .jpg are handled. Users have to manually convert markdown to HTML before indexing.

Reproduction

mkdir docs && echo "# Hello World" > docs/readme.md
cat > pixelrag.yaml <<EOF
source:
  type: local
  path: ./docs
embed:
  model: Qwen/Qwen3-VL-Embedding-2B
  device: auto
output: ./index
EOF
pixelrag index build
# Stage 1: Rendering 0 documents to tiles...
# (readme.md was silently ignored)

Proposed fix

In index/src/pixelrag_index/sources/local.py, add .md and .txt to the supported extensions. In pipelines.py, add a text_docs category that:

  1. Wraps the text/markdown content in a minimal styled HTML template
  2. Renders via the CDP backend (like URL docs)

This is lightweight — just wrapping content in <html><body><pre> for .txt or a basic markdown→HTML conversion (the markdown package or even regex-based) for .md.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions