This guide explains how Fortémi automatically constructs and traverses knowledge graphs.
When you create a note, Fortémi's NLP pipeline automatically constructs the knowledge graph across three phases:
Phase 1 (parallel):
- AI concept tagging — Generates 8-15 SKOS concept tags across 6 dimensions (domain, topic, methodology, application, technique, content-type)
Phase 2 (after tagging completes):
2. Related concept inference — Uses the LLM to identify cross-dimensional associative relationships (e.g., technique/attention-mechanism related to domain/machine-learning) and creates skos:related edges with confidence scores. Skips notes with fewer than 3 leaf concepts.
3. Tag-enriched embeddings — Converts content + SKOS concept labels and their hierarchical relationships (broader, narrower, related) into vectors, producing semantically richer embeddings than content alone
Phase 3 (after embedding completes): 4. Tag-boosted similarity — Blends embedding cosine similarity with SKOS tag overlap using a configurable weight formula:
final_score = (embedding_sim × (1 - tag_weight)) + (tag_overlap × tag_weight)
- HNSW diverse neighbor selection — Uses Algorithm 4 (Malkov & Yashunin 2018) to select up to k neighbors that are closer to the source than to already-selected neighbors, preventing star topology on clustered data
- Creates reciprocal links — Bidirectional links with metadata (strategy, k, rank, tag_weight)
This ordering is critical: concept relationships are inferred before embeddings are generated, so embeddings incorporate the full concept graph context. Tags, relationships, and embeddings all inform linking. The result is significantly higher-quality connections than pure embedding similarity alone.
| Type | Creation | Directionality |
|---|---|---|
| Semantic | Automatic (embedding similarity + tag overlap) | Bidirectional |
| Explicit | Manual (user-defined or [[wiki-style]] links) |
Directional |
Retrieve all links for a specific note:
curl "http://localhost:3000/api/v1/notes/{id}/links"Response:
{
"outgoing": [
{
"to_note_id": "uuid",
"score": 0.85,
"kind": "semantic"
}
],
"incoming": [
{
"from_note_id": "uuid",
"score": 0.78,
"kind": "semantic"
}
]
}Traverse the knowledge graph starting from any note. Returns a versioned payload with nodes, edges, and truncation/guardrail metadata.
curl "http://localhost:3000/api/v1/graph/{id}?depth=2&max_nodes=50"Parameters:
| Param | Default | Range | Description |
|---|---|---|---|
depth |
2 | 0-10 | Maximum hops to traverse |
max_nodes |
50 | 1-1000 | Limit total nodes returned |
min_score |
0.0 | 0.0-1.0 | Minimum edge score threshold |
max_edges_per_node |
unlimited | 1-1000 | Cap edges per hub node |
Response (v1):
{
"graph_version": "v1",
"nodes": [
{
"id": "uuid",
"title": "Note Title",
"depth": 0,
"collection_id": "uuid-or-null",
"archived": false,
"created_at_utc": "2026-01-15T10:00:00Z",
"updated_at_utc": "2026-02-01T14:30:00Z"
},
{
"id": "uuid",
"title": "Related Note",
"depth": 1,
"collection_id": null,
"archived": false,
"created_at_utc": "2026-01-20T08:00:00Z",
"updated_at_utc": "2026-01-20T08:00:00Z"
}
],
"edges": [
{
"source": "uuid",
"target": "uuid",
"edge_type": "semantic",
"score": 0.85,
"rank": 1,
"computed_at": "2026-01-20T08:01:00Z"
},
{
"source": "uuid",
"target": "uuid",
"edge_type": "explicit",
"score": 1.0,
"rank": 1,
"computed_at": "2026-02-01T14:30:00Z"
}
],
"meta": {
"total_nodes": 127,
"total_edges": 342,
"truncated_nodes": 77,
"truncated_edges": 0,
"effective_depth": 2,
"effective_max_nodes": 50,
"effective_min_score": 0.0,
"truncation_reasons": [
"max_nodes limit: 50 of 127 nodes returned"
]
}
}Node fields:
| Field | Type | Description |
|---|---|---|
id |
UUID | Note identifier |
title |
string/null | Note title |
depth |
integer | Hops from starting node (0 = start) |
collection_id |
UUID/null | Parent collection |
archived |
boolean | Archive status |
created_at_utc |
datetime | Creation timestamp |
updated_at_utc |
datetime | Last update timestamp |
community_id |
integer/null | Community cluster ID (when available) |
community_label |
string/null | Community label (when available) |
community_confidence |
float/null | Community assignment confidence |
Edge fields:
| Field | Type | Description |
|---|---|---|
source |
UUID | Origin note ID |
target |
UUID | Destination note ID |
edge_type |
string | "semantic" or "explicit" |
score |
float | Similarity score (0.0-1.0) |
rank |
integer/null | Rank among edges from source node |
embedding_set |
string/null | Embedding set used (provenance) |
model |
string/null | Model used for embedding (provenance) |
computed_at |
datetime/null | When the link was computed |
Meta fields (guardrails):
| Field | Type | Description |
|---|---|---|
total_nodes |
integer | Total reachable nodes before truncation |
total_edges |
integer | Total qualifying edges before truncation |
truncated_nodes |
integer | Nodes omitted due to limits |
truncated_edges |
integer | Edges omitted due to limits |
effective_depth |
integer | Depth actually applied (after server clamping) |
effective_max_nodes |
integer | Max nodes actually applied |
effective_min_score |
float | Score threshold actually applied |
effective_max_edges_per_node |
integer/null | Per-node edge limit (null = unlimited) |
truncation_reasons |
string[] | Human-readable truncation explanations |
Legacy compatibility: The v1 contract replaces the previous unversioned response. Clients should check for graph_version to detect the payload format. Edge fields changed from from_id/to_id/kind to source/target/edge_type.
| Score | Relationship |
|---|---|
| 90%+ | Nearly identical topics |
| 80-90% | Strongly related |
| 70-80% | Related (link threshold) |
| 60-70% | Tangentially related (no link) |
| <60% | Different topics |
The 70% threshold was empirically chosen to balance:
- Precision - Avoiding spurious connections
- Recall - Discovering meaningful relationships
Higher thresholds miss valid relationships; lower thresholds create noise.
When SKOS tags are available (the default for all notes processed by the NLP pipeline), the linking system blends embedding similarity with tag overlap. This means two notes about "machine learning" that share SKOS concepts like domain/ai/machine-learning will score higher than their raw embedding similarity suggests.
The tag weight is configurable per linking strategy. A fallback ensures that even if the tag-boosted heuristic selects nothing, the single best embedding match is still linked to prevent note isolation.
Find related context when writing:
# Get notes related to what you're writing
curl "http://localhost:3000/api/v1/notes/{draft_id}/links"Explore topic clusters via graph traversal:
# Find all notes within 2 hops of a seed note
curl "http://localhost:3000/api/v1/graph/{seed_id}/explore?depth=2"Notes with few links may indicate:
- Novel topics (good)
- Poorly integrated knowledge (needs attention)
Build breadcrumb trails through related content for users exploring the knowledge base.
Every link is bidirectional. The "incoming" links show what notes reference the current note—useful for understanding how concepts connect.
{
"incoming": [
{
"from_note_id": "uuid",
"from_note_title": "Machine Learning Basics",
"score": 0.82
}
]
}- Embedding generation: ~500ms per note (GPU) / ~2s (CPU)
- Similarity computation: O(N) against existing notes
- Link creation: Batched, async via job queue
- Single-hop: <10ms
- Multi-hop (depth=3): <50ms
- Uses recursive CTE for efficiency
Understanding how the knowledge graph builds itself helps you get the most from it:
- Create a note via API or MCP
capture_knowledge - Phase 1 NLP jobs run in parallel: AI revision, title generation, concept tagging, metadata extraction, document type inference
- Concept tagging completes — the note now has 8-15 hierarchical SKOS tags
- Related concept inference runs — the LLM identifies associative relationships between the note's concepts and creates
skos:relatededges - Tag-enriched embedding is queued — embedding generation uses concept labels (prefixed
clustering:) and their relationships (broader, narrower, related) for richer vectors; high-frequency concepts are filtered via TF-IDF to prevent generic terms from dominating - Tag-boosted linking is queued — new connections appear using both embeddings and tag overlap
- The knowledge graph grows — new connections appear automatically
- Graph maintenance runs periodically — a
GraphMaintenancejob applies the full quality pipeline: normalization → SNN → PFNET sparsification → Louvain community detection → diagnostics snapshot
You don't need to trigger any of these steps manually. The entire pipeline runs in the background via the job queue.
To re-run the note pipeline (e.g., after a model upgrade), use bulk_reprocess_notes via MCP or the REST API. To trigger graph maintenance immediately, use POST /api/v1/graph/maintenance or the trigger_graph_maintenance MCP tool.
The graph maintenance job applies a four-step quality pipeline to the entire knowledge graph.
Raw similarity scores are normalized using a configurable gamma parameter. This corrects for embedding model bias and produces a more uniform score distribution.
normalized_score = score ^ GRAPH_NORMALIZATION_GAMMA
Environment variable: GRAPH_NORMALIZATION_GAMMA (default: 1.0)
Normalizes per-note edge scores by comparing each note's neighborhood to its neighbors' neighborhoods. Notes that share many neighbors get stronger links; isolated connections are weakened.
- k parameter (
GRAPH_SNN_K): Number of nearest neighbors considered per node - Prune threshold (
GRAPH_SNN_PRUNE_THRESHOLD): Minimum SNN score to retain an edge
SNN is effective at breaking the "seashell pattern" — a topology defect where one highly-connected hub pulls many notes into artificial proximity, producing a star-shaped cluster rather than a meaningful topic cluster.
Pathfinder Network (PFNET) pruning removes edges that are redundant given transitive paths. An edge (A→C) is removed if a path A→B→C exists where both segments are stronger than the direct connection.
Environment variable: GRAPH_PFNET_Q — controls the pathfinder metric space. Higher values preserve more edges.
PFNET produces a sparser, more interpretable graph where each retained edge represents a genuinely direct relationship.
Applies the Louvain algorithm to identify topic communities in the sparsified graph. Each note is assigned a community_id and community_label.
Environment variable: GRAPH_COMMUNITY_RESOLUTION — controls community granularity (higher = more, smaller communities)
For large knowledge bases, coarse community detection uses MRL 64-dimensional embeddings for efficiency via POST /api/v1/graph/community/coarse.
Community assignments appear on graph nodes in GET /api/v1/graph/{id} responses:
{
"id": "uuid",
"title": "Note Title",
"community_id": 3,
"community_label": "machine-learning",
"community_confidence": 0.87
}After each maintenance run, a diagnostics snapshot is captured automatically. Snapshots record graph health metrics (node count, edge count, average degree, community count, isolated node count) and enable trend tracking over time.
POST /api/v1/graph/maintenanceQueues a GraphMaintenance job that runs the full quality pipeline (normalize → SNN → PFNET → Louvain → diagnostics). Returns the job ID for status tracking.
POST /api/v1/graph/snn/recomputeRecomputes Shared Nearest Neighbor scores without running the full pipeline. Useful after bulk note imports.
POST /api/v1/graph/pfnet/sparsifyApplies PFNET pruning to the current graph state.
POST /api/v1/graph/community/coarseRuns Louvain community detection using MRL 64-dimensional embeddings. Efficient for large knowledge bases.
# Capture a diagnostics snapshot
POST /api/v1/graph/diagnostics/snapshot
# List all snapshots
GET /api/v1/graph/diagnostics/history
# Compare two snapshots
GET /api/v1/graph/diagnostics/compare?from={snapshot_id}&to={snapshot_id}The compare endpoint highlights changes in graph health between two points in time, useful for validating that maintenance improved graph quality.
Two MCP tools support graph maintenance workflows:
| Tool | Description |
|---|---|
trigger_graph_maintenance |
Queue the full 4-step maintenance pipeline |
coarse_community_detection |
Run community detection using MRL 64-dim embeddings |
These are part of the manage_graphs discriminated-union tool group. The MCP server now exposes 37 total core tools (was 35).
| Variable | Default | Description |
|---|---|---|
GRAPH_NORMALIZATION_GAMMA |
1.0 |
Score normalization exponent |
GRAPH_SNN_K |
(system default) | Nearest neighbors for SNN computation |
GRAPH_SNN_PRUNE_THRESHOLD |
(system default) | Minimum SNN score to retain an edge |
GRAPH_PFNET_Q |
(system default) | PFNET pathfinder metric space parameter |
GRAPH_COMMUNITY_RESOLUTION |
(system default) | Louvain community granularity |
GRAPH_STRUCTURAL_SCORE |
(system default) | Weight for structural vs. similarity scores in linking |
EMBED_CONCEPT_MAX_DOC_FREQ |
(system default) | TF-IDF document frequency cutoff for concept filtering in embeddings |
As your knowledge base evolves, older embeddings may become less representative. Consider periodic re-embedding for large, long-lived collections using bulk_reprocess_notes.
New knowledge bases have sparse graphs until sufficient content accumulates. Minimum ~10 notes for meaningful connections.
Notes on completely different topics won't link semantically, even if you want to connect them. Use explicit [[wiki-style]] links in note content for cross-domain connections.
See also: Search Guide | Tags Guide | Research Background