A semantic validation tool for RO-Crate
metadata. It checks each entity in an ro-crate-metadata.json against one or more
RO-Crate profile specifications, using retrieval-augmented generation (RAG) to
pull the relevant rules per entity and an LLM to judge both compliance and
usefulness for the intended audience.
- Bring your own model: any OpenAI-compatible provider (OpenAI, DeepSeek, Ollama, Together, …) via
providers.yaml. Embeddings are configured independently of the chat model. - Actionable output: each finding includes typed fix operations (
add/replace/remove) with suggested values. - Context-aware: determines whether metadata is useful for the intended audience.
- Streamlit UI and a CLI entry point.
flowchart LR
subgraph UI
APP[app.py<br/>Streamlit]
end
subgraph Core
CFG[config.py<br/>providers.yaml]
LLM[llm.py<br/>factories]
REG[profile_registry.py<br/>RAG]
VAL[validation.py<br/>CrateValidator]
end
APP --> CFG
APP --> LLM
APP --> REG
APP --> VAL
LLM --> REG
LLM --> VAL
VAL --> REG
sequenceDiagram
actor User
participant App as app.py
participant Reg as ProfileRegistry
participant Emb as Embeddings provider
participant Val as CrateValidator
participant Chat as Chat provider
User->>App: upload crate + profiles, pick provider
App->>Reg: ingest profile markdown
Reg->>Emb: embed chunks
Emb-->>Reg: vectors (Chroma)
loop per entity
App->>Val: validate_entity(entity, audience)
Val->>Reg: get_rules_for_entity(@type)
Reg-->>Val: relevant rules
Val->>Chat: prompt(rules + entity + audience)
Chat-->>Val: structured ValidationResult
end
Val-->>App: results
App-->>User: issues, fixes, quality, JSON report
-
Create and activate a virtual environment:
python -m venv .venv && source .venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Provide API keys in a
.envfile in the project root. Each provider reads the key named by itskey_envinproviders.yaml:OPENAI_API_KEY=sk-... DEEPSEEK_API_KEY=sk-... # only if using DeepSeek
Providers set in providers.yaml:
deepseek:
base_url: https://api.deepseek.com
key_env: DEEPSEEK_API_KEY
models: [deepseek-chat]
embedding_models: [] # (DeepSeek has no embeddings API)Because DeepSeek has no embeddings API, pair it with another embeddings provider in the UI, e.g. DeepSeek chat + OpenAI embeddings. To add a new OpenAI-compatible provider, add an entry with its base_url, the env var holding its key, and its model list.
streamlit run app.pyUpload ro-crate-metadata.json and one or more profile .md files, select the chat and embeddings providers, optionally describe the intended audience, and select Validate.
python validation.py path/to/ro-crate-metadata.json path/to/profile.mddocker build -t rocrate-validator .
docker run --env-file .env -p 8501:8501 rocrate-validatorThen open http://localhost:8501.
pip install -r requirements-dev.txt
pytestTests mock the LLM and embeddings.