| title | JSON Schema Mappings — Index any JSON format declaratively |
|---|---|
| description | Map any JSON schema to typed RTFM chunks via declarative YAML files in .rtfm/mappings/. Replaces format-specific Python parsers. Used for NotebookLM exports, Linear, OpenAPI specs, structured logs. |
RTFM lets you declaratively map any JSON schema to chunks and edges via
small YAML files. Drop a mapping into .rtfm/mappings/, RTFM reads it at sync
time, and matching JSON files in your project are extracted into typed chunks
without writing a single line of Python.
This is how RTFM stays generic: instead of shipping format-specific parsers for every JSON-based tool out there (NotebookLM exports, Linear exports, Notion dumps, OpenAPI specs, structured logs…), the project that produces the format ships its own mapping. Anyone can add support for any JSON schema in ~30 lines of YAML.
A NotebookLM batch export answer.json:
{
"type": "nblm-answer",
"asked_at": "2026-05-04T13:30:00Z",
"notebook": { "id": "n-1", "url": "https://notebooklm.google.com/..." },
"question": "What is the OSBD process?",
"answer": { "text": "OSBD is the four-step acronym at the core of CNV..." },
"citations": [
{ "marker": "[1]", "source_name": "Keller.pdf", "source_text": "Observation neutre..." },
{ "marker": "[2]", "source_name": "Rosenberg.pdf", "source_text": "..." }
]
}The matching mapping at .rtfm/mappings/nblm-answer.yaml:
# yaml-language-server: $schema=https://schemas.roomi-fields.com/rtfm-mapping-v1.json
name: nblm-answer-v1
match:
schema_url: "https://schemas.roomi-fields.com/nblm-answer-v1.json"
discriminator:
type: nblm-answer
chunks:
- title: "Q: {{ question }}"
content: "{{ answer.text }}"
metadata:
notebook_id: "{{ notebook.id }}"
asked_at: "{{ asked_at }}"
- foreach: citations
title: "{{ marker }} {{ source_name }}"
content: "{{ source_text }}"
metadata:
source_name: "{{ source_name }}"
citation_marker: "{{ marker }}"
edges:
- relation: cites
foreach: citations
target: "{{ source_name }}"After rtfm sync, the JSON file produces:
- 1 chunk for the answer body, retrievable by question keywords or answer text
- N chunks for citations, each independently retrievable by source name or excerpt content
- Typed metadata in
chunks.metadata(notebook_id, source_name, marker…) citesedge candidates per citation
name: my-mapping # required, unique per project
match: { ... } # required, see "Match rules"
chunks: [ ... ] # at least one required
edges: [ ... ] # optionalA mapping is applied to a JSON document when any declared rule matches.
| Field | Description |
|---|---|
match.schema_url |
Matches the document's $schema or $id field |
match.discriminator |
Dict of { field_path: expected_value } — all entries must match |
Discriminator paths support dotted notation (meta.kind: foo walks into
nested objects).
chunks:
- title: "..." # template, becomes chapter_title
content: "..." # template, becomes chunk content (required, non-empty)
foreach: <path> # optional — emit one chunk per item in the list at <path>
metadata: { ... } # template values, stored in chunks.metadataWhen foreach is set, templates inside the spec evaluate against each
item of the list. Otherwise they evaluate against the root document.
edges:
- relation: cites # stored as edges.relation_type
foreach: <path> # optional, like chunks
target: "..." # template producing the target reference
target_kind: literal # informational hint: literal | slug | urlEdge resolution to the database happens in the sync layer. Edges with
relation types import / link / include are resolved against indexed
files. Custom relation types (e.g. cites) currently land as
EdgeCandidates but are not yet materialized in the edges table — index
the cited sources alongside the JSON file if you want a navigable graph.
Template expressions use {{ dotted.path.to.field }} syntax. Rules:
{{ a.b.c }}walks nested dicts/lists (numeric segments index lists)- Missing paths render as empty strings
- No control flow, no expressions, no eval — paths only
Anything more sophisticated is intentionally out of scope. If your schema needs computation, write a Python parser instead.
The mapping format itself is published as a JSON Schema at:
Reference it from your mapping files to get autocomplete and inline
validation in any IDE that supports JSON Schema (VS Code, JetBrains,
Neovim with coc-yaml, Helix, …):
=== "YAML"
```yaml
# yaml-language-server: $schema=https://schemas.roomi-fields.com/rtfm-mapping-v1.json
name: my-mapping
match:
discriminator: { type: foo }
chunks:
- content: "{{ payload }}"
```
Requires the [YAML extension](https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml)
in VS Code (or equivalent in your editor). The directive is a
standard YAML comment, ignored at runtime.
=== "JSON"
```json
{
"$schema": "https://schemas.roomi-fields.com/rtfm-mapping-v1.json",
"name": "my-mapping",
"match": { "discriminator": { "type": "foo" } },
"chunks": [{ "content": "{{ payload }}" }]
}
```
The `$schema` field is recognized natively by VS Code, JetBrains
IDEs, and most editors with no extension required.
The schema is versioned: the v1 suffix freezes the contract.
Breaking changes ship as rtfm-mapping-v2.json and v1 stays valid
indefinitely.
A growing number of producing projects publish formal JSON Schemas for their outputs:
| Producer | Schema URL |
|---|---|
notebooklm-mcp |
https://schemas.roomi-fields.com/nblm-answer-v1.json |
When the JSON file you index has a $schema field pointing to one of
these URLs, you reference the same URL in your mapping's
match.schema_url. RTFM does an exact string match — no remote fetch,
no validation. The URL acts as a stable discriminator.
If you maintain a JSON-producing tool, publishing your schema makes RTFM integration trivial for any user: they paste your schema URL into a mapping file and they're done. Consider publishing alongside your project (a static JSON file behind any CDN or GitHub Pages suffices).
RTFM scans .rtfm/mappings/ (next to your library DB) at every Library
initialization. Drop .yaml, .yml, or .json files there. Subdirectories
are not scanned.
Malformed mapping files are silently skipped (log entry pending) — a bad mapping never breaks sync.
When the JSON parser encounters a .json file:
- Parse to a Python dict
- Ask
MappingRegistry.find_mapping(data)— first match wins - If matched: apply the mapping (chunks + edges)
- If not matched: fall back to the generic structural parser (one chunk per top-level key)
Order of evaluation is registration order. If two mappings can match the
same document, the one loaded first wins. File names in .rtfm/mappings/
are processed in sorted order.
RTFM doesn't bundle any mappings. The NotebookLM project ships its own
nblm-answer.yaml — copy it into .rtfm/mappings/ and you're done. Same
goes for any other tool: the project that produces a JSON schema is best
positioned to define how it should be indexed.
This keeps RTFM honest about its scope: a generic retrieval layer that extends through pluggable conventions, not a registry of every format under the sun.
- Start here: NotebookLM
answer.json— see the full RTFM × NotebookLM recipe for the ready-to-copy mapping and the markdown-only alternative path. - OpenAPI specs: one chunk per
paths.<route>operation, edges for$refs. - Linear/Jira exports: one chunk per issue, edges for
parent_id,blocks. - Test reports (Vitest, pytest JSON): one chunk per failure, metadata
with
file,line,error_type.
If you write a useful mapping, share it — open a PR adding the snippet to this page.