RO-Crate RAG Validator

A semantic validation tool for RO-Crate metadata. It checks each entity in an ro-crate-metadata.json against one or more RO-Crate profile specifications, using retrieval-augmented generation (RAG) to pull the relevant rules per entity and an LLM to judge both compliance and usefulness for the intended audience.

Features

Bring your own model: any OpenAI-compatible provider (OpenAI, DeepSeek, Ollama, Together, …) via providers.yaml. Embeddings are configured independently of the chat model.
Actionable output: each finding includes typed fix operations (add / replace / remove) with suggested values.
Context-aware: determines whether metadata is useful for the intended audience.
Streamlit UI and a CLI entry point.

Architecture

flowchart LR
    subgraph UI
        APP[app.py<br/>Streamlit]
    end
    subgraph Core
        CFG[config.py<br/>providers.yaml]
        LLM[llm.py<br/>factories]
        REG[profile_registry.py<br/>RAG]
        VAL[validation.py<br/>CrateValidator]
    end
    APP --> CFG
    APP --> LLM
    APP --> REG
    APP --> VAL
    LLM --> REG
    LLM --> VAL
    VAL --> REG

Data flow

sequenceDiagram
    actor User
    participant App as app.py
    participant Reg as ProfileRegistry
    participant Emb as Embeddings provider
    participant Val as CrateValidator
    participant Chat as Chat provider

    User->>App: upload crate + profiles, pick provider
    App->>Reg: ingest profile markdown
    Reg->>Emb: embed chunks
    Emb-->>Reg: vectors (Chroma)
    loop per entity
        App->>Val: validate_entity(entity, audience)
        Val->>Reg: get_rules_for_entity(@type)
        Reg-->>Val: relevant rules
        Val->>Chat: prompt(rules + entity + audience)
        Chat-->>Val: structured ValidationResult
    end
    Val-->>App: results
    App-->>User: issues, fixes, quality, JSON report

Setup

Create and activate a virtual environment:

python -m venv .venv && source .venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Provide API keys in a .env file in the project root. Each provider reads the key named by its key_env in providers.yaml:
```
OPENAI_API_KEY=sk-...
DEEPSEEK_API_KEY=sk-...   # only if using DeepSeek
```

Configuring providers

Providers set in providers.yaml:

deepseek:
  base_url: https://api.deepseek.com
  key_env: DEEPSEEK_API_KEY
  models: [deepseek-chat]
  embedding_models: []        # (DeepSeek has no embeddings API)

Because DeepSeek has no embeddings API, pair it with another embeddings provider in the UI, e.g. DeepSeek chat + OpenAI embeddings. To add a new OpenAI-compatible provider, add an entry with its base_url, the env var holding its key, and its model list.

Running

Streamlit UI

streamlit run app.py

Upload ro-crate-metadata.json and one or more profile .md files, select the chat and embeddings providers, optionally describe the intended audience, and select Validate.

CLI

python validation.py path/to/ro-crate-metadata.json path/to/profile.md

Docker

docker build -t rocrate-validator .
docker run --env-file .env -p 8501:8501 rocrate-validator

Then open http://localhost:8501.

Tests

pip install -r requirements-dev.txt
pytest

Tests mock the LLM and embeddings.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.streamlit		.streamlit
crates		crates
profiles/base_profile		profiles/base_profile
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
app.py		app.py
config.py		config.py
llm.py		llm.py
profile_registry.py		profile_registry.py
providers.yaml		providers.yaml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
ui_helpers.py		ui_helpers.py
validation.py		validation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RO-Crate RAG Validator

Features

Architecture

Data flow

Setup

Configuring providers

Running

Streamlit UI

CLI

Docker

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RO-Crate RAG Validator

Features

Architecture

Data flow

Setup

Configuring providers

Running

Streamlit UI

CLI

Docker

Tests

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages