Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,8 +112,6 @@ fun contentPipeline() = workflow("content-pipeline") {
agent("reviewer") { role = "Content Reviewer"; model = Models.GEMINI_3_1_PRO }
}

rubrics { rubric("content-quality", "content-quality.md") }

state {
input("topic", VarType.STRING)
variable("draft", VarType.STRING, "the full written article text")
Expand All @@ -126,7 +124,7 @@ fun contentPipeline() = workflow("content-pipeline") {
agent = "writer"
prompt = "Write a short article about {topic}. {recommendation}"
writes("draft")
rubric = "content-quality"
rubric = "content-quality.md"
onScore {
whenScore lessThan 70.0 goto "write" // score too low – retry
}
Expand Down
9 changes: 5 additions & 4 deletions docs/developer-guide-core.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ map keyed by node ID. For `StandardNode`s with `writes` declared, it routes the
- **Single write**: attempts to parse the response as JSON and extract the declared key; falls back to the full raw text if the key is absent or the output is not valid JSON.
- **Multiple writes**: the response is parsed as JSON and each declared key is extracted into context.

**`RubricPostProcessor`** — If a node has a `rubricId`, this processor evaluates the output against the rubric's
**`RubricPostProcessor`** — If a node has a `Rubric`, this processor evaluates the output against the rubric's
criteria using the `RubricEngine`. It stores the evaluation result in the state for use by `ScoreTransition` rules. If
the evaluation fails and no explicit transition handles the low score, it can trigger an **auto-backtrack** to a prior
step, enabling self-correcting loops. On auto-backtrack, sets `state.nodeRedirected = true` — downstream processors
Expand Down Expand Up @@ -710,15 +710,16 @@ The rubric engine evaluates output quality against defined criteria with score-b

| Component | Description |
|----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
| `RubricEngine` | Orchestrates evaluation using repository and evaluator |
| `RubricRepository` | Stores rubric definitions (in-memory by default) |
| `RubricEngine` | Orchestrates evaluation using evaluator |
| `RubricEvaluator` | Evaluates output against criteria |
| `ScoreExtractingEvaluator` | Reads the `score` engine variable from context; accumulates `recommendation` feedback for failing criteria into `_rubric_criterion_feedback` |
| `Rubric` | Immutable definition with pass threshold and weighted criteria |
| `Criterion` | Single evaluation dimension with weight and minimum score |

### How Evaluation Works

Rubrics are parsed at build time and stored directly on the node as typed `Rubric` objects.

`ScoreExtractingEvaluator` reads the `score` engine variable directly from the execution context. The score is extracted automatically by `OutputExtractionPostProcessor` whenever the node has a `ScoreTransition` — no JSON parsing is needed in the evaluator itself.

If the score falls below a criterion's minimum and a `recommendation` engine variable is present in context, the text is appended to `_rubric_criterion_feedback`. `RubricPostProcessor` uses that list to assemble a combined backtrack context update for self-correcting loops.
Expand Down Expand Up @@ -861,7 +862,7 @@ Each injector fires when **either** of two conditions is met:

```mermaid
flowchart LR
r(["RubricPromptInjector\n· rubricId != null"]) --> s(["ScoreVariableInjector\n· ScoreTransition or\nconsensus branch"])
r(["RubricPromptInjector\n· rubric != null"]) --> s(["ScoreVariableInjector\n· ScoreTransition or\nconsensus branch"])
s --> a(["ApprovalVariableInjector\n· ApprovalTransition or\nconsensus branch"])
a --> rec(["RecommendationVariable\nInjector\n· Score/Approval or\nconsensus branch"])
rec --> w(["WritesVariableInjector\n· has writes()"])
Expand Down
27 changes: 4 additions & 23 deletions docs/developer-guide-server.md
Original file line number Diff line number Diff line change
Expand Up @@ -970,7 +970,6 @@ Every integration test extends `IntegrationTestBase`, which provides:
| `pushAndExecuteWithMcp(workflow, ctx, endpoint)` | Executes with MCP-enabled tenant context |
| `registerStub(key, response)` | Programmatic stub registration by node ID or agent ID |
| `registerStub(scenario, key, response)` | Scenario-specific stub registration |
| `resolveRubricPath(resourceName)` | Copies classpath rubric to temp file for `RubricParser` |

#### Writing an Integration Test

Expand Down Expand Up @@ -1098,28 +1097,10 @@ Place workflow definitions in `src/test/resources/workflows/`. Use `model: "stub

#### Rubric Testing

Pre-register parsed rubrics so the executor skips filesystem path resolution:

```java
private Rubric parseAndRegisterRubric(String rubricId, String resourceName) {
String rubricPath = resolveRubricPath(resourceName);
Rubric parsed = RubricParser.parse(Path.of(rubricPath));

Rubric rubric = Rubric.builder()
.id(rubricId)
.name(parsed.getName())
.version(parsed.getVersion())
.type(parsed.getType())
.passThreshold(parsed.getPassThreshold())
.criteria(parsed.getCriteria())
.build();

hensuEnvironment.getRubricRepository().save(rubric);
return rubric;
}
```

Place rubric markdown files in `src/test/resources/rubrics/`.
Rubrics are parsed at build time and stored directly on workflow nodes. Workflow JSON fixtures
include inline rubric content in the `"rubric"` field, which the deserializer parses into typed
`Rubric` objects at load time. No separate registration step is needed — just call
`loadWorkflow("fixture.json")` and the rubric is ready.

#### Repository Tests (Testcontainers)

Expand Down
54 changes: 18 additions & 36 deletions docs/dsl-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,6 @@ fun myWorkflow() = workflow("WorkflowName") {
// Agent definitions
}

rubrics {
// Rubric references (optional)
}

config {
// Execution settings (optional)
}
Expand All @@ -78,7 +74,6 @@ fun myWorkflow() = workflow("WorkflowName") {
|---------------|--------------------------------------------------------------------------------------------------------------------------|
| `state { }` | Optional typed state schema. When declared, enables load-time validation of `writes` and `{variable}` prompt references. |
| `agents { }` | Agent definitions (models, roles, temperatures) |
| `rubrics { }` | Rubric file references for quality evaluation |
| `config { }` | Workflow execution settings |
| `graph { }` | Node graph (required) |

Expand Down Expand Up @@ -155,7 +150,7 @@ Standard nodes execute an agent with a prompt and transition based on the result
node("node-id") {
agent = "agent-id"
prompt = "Your prompt with {placeholders}"
rubric = "rubric-id" // Optional
rubric = "rubric-id.md" // Optional
writes("param1", "param2") // Optional — declare state variables this node produces

review(ReviewMode.OPTIONAL) // Optional
Expand All @@ -169,11 +164,11 @@ node("node-id") {

#### Standard Node Properties

| Property | Type | Required | Description |
|-------------|---------|----------|-----------------------------------------|
| `agent` | String? | Yes | ID of the agent to execute |
| `prompt` | String? | Yes | Prompt template or `.md` file reference |
| `rubric` | String? | No | ID of rubric to evaluate output quality |
| Property | Type | Required | Description |
|----------|---------|----------|----------------------------------------------------------------|
| `agent` | String? | Yes | ID of the agent to execute |
| `prompt` | String? | Yes | Inline prompt template or `.md` file reference from `prompts/` |
| `rubric` | String? | No | Inline rubric content or `.md` file reference from `rubrics/` |

#### Standard Node Functions

Expand Down Expand Up @@ -219,13 +214,13 @@ parallel("review-committee") {

#### Branch Properties

| Property | Type | Required | Description |
|------------|---------------|----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `agent` | String | Yes | ID of the agent to execute |
| `prompt` | String? | No | Prompt template or `.md` file reference |
| `rubric` | String? | No | ID of rubric for branch evaluation. When set, the rubric's pass/fail result determines the branch's consensus vote (APPROVE/REJECT), overriding text-based heuristics |
| `weight` | Double | No | Vote weight for `WEIGHTED_VOTE` consensus strategy. Higher values give more influence to the branch score (default: 1.0) |
| `yields()` | vararg String | No | State variable names this branch produces as structured domain output. The agent's JSON response must include these fields; the engine extracts and merges them into workflow state |
| Property | Type | Required | Description |
|------------|---------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `agent` | String | Yes | ID of the agent to execute |
| `prompt` | String? | No | Inline prompt template or `.md` file reference from `prompts/` |
| `rubric` | String? | No | Inline rubric content or `.md` file reference from `rubrics/`. When set, the rubric's pass/fail result determines the branch's consensus vote (APPROVE/REJECT), overriding text-based heuristics |
| `weight` | Double | No | Vote weight for `WEIGHTED_VOTE` consensus strategy. Higher values give more influence to the branch score (default: 1.0) |
| `yields()` | vararg String | No | State variable names this branch produces as structured domain output. The agent's JSON response must include these fields; the engine extracts and merges them into workflow state |

#### Rubric-Based Consensus

Expand Down Expand Up @@ -335,7 +330,7 @@ generic("validate-input") {
"required" to true
}

rubric = "validation-rubric" // Optional
rubric = "validation-rubric.md" // Optional

onSuccess goto "process"
onFailure retry 2 otherwise "error"
Expand Down Expand Up @@ -725,7 +720,7 @@ Because `recommendation` is an engine variable, you reference it as a `{placehol
node("score-content") {
agent = "reviewer"
prompt = "Review the article: {article}\nOutput a score and feedback."
rubric = "content-quality"
rubric = "content-quality.md"

onScore {
whenScore greaterThanOrEqual 80.0 goto "publish"
Expand All @@ -746,24 +741,15 @@ node("revise") {
```

## Rubrics
Rubrics define quality evaluation criteria for node outputs. Rubric files should be placed in the `rubrics/` directory of your working directory.

Rubrics define quality evaluation criteria for node outputs.

```kotlin
rubrics {
rubric("quality-check", "quality.md") // rubrics/quality.md
rubric("pr-review", "templates/pr.md") // rubrics/templates/pr.md
rubric("docs") { file = "documentation.md" } // Alternative syntax
}
```

Reference rubrics in nodes:
To use a rubric, specify the filename in the node's `rubric` property:

```kotlin
node("review") {
agent = "reviewer"
prompt = "Review this code"
rubric = "quality-check"
rubric = "content-quality.md" // References rubrics/content-quality.md

onScore {
whenScore greaterThanOrEqual 80.0 goto "approve"
Expand Down Expand Up @@ -1094,10 +1080,6 @@ fun contentPipeline() = workflow("ContentPipeline") {
}
}

rubrics {
rubric("content-quality", "content-quality.md")
}

graph {
start at "research"

Expand Down
2 changes: 1 addition & 1 deletion docs/unified-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -760,7 +760,7 @@ CDI wiring, `WorkflowExecutor`, `TenantContext` — against in-memory repositori
profile disables PostgreSQL, Flyway, and the scheduler (no Docker required).

All integration tests extend `IntegrationTestBase`, which provides CDI injection, per-test
state cleanup, and helpers (`registerStub`, `pushAndExecute`, `resolveRubricPath`).
state cleanup, and helpers (`registerStub`, `pushAndExecute`).

### Repository Tests (Testcontainers PostgreSQL)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -190,20 +190,18 @@ private void displayHelp() {
private ReviewDecision handleBacktrack(ReviewData data) {
List<ReviewData.StepInfo> steps = data.historySteps();

if (steps == null || steps.size() <= 1) {
if (steps == null || steps.isEmpty()) {
println(styles.warn("No previous steps available to backtrack to."));
return null;
}

List<ReviewData.StepInfo> validSteps = steps.subList(0, steps.size() - 1);

println("");
println(styles.boxTopWithLabel(styles.dim("backtrack target")));
println("");

for (int i = validSteps.size() - 1; i >= 0; i--) {
ReviewData.StepInfo step = validSteps.get(i);
int displayNum = validSteps.size() - i;
for (int i = steps.size() - 1; i >= 0; i--) {
ReviewData.StepInfo step = steps.get(i);
int displayNum = steps.size() - i;
boolean ok = "SUCCESS".equals(step.status());
String status = styles.successOrError(ok ? "OK" : "FAIL", ok);
println(String.format(" [%d] %s (%s)", displayNum, step.nodeId(), status));
Expand All @@ -217,12 +215,12 @@ private ReviewDecision handleBacktrack(ReviewData data) {
try {
int choice = Integer.parseInt(input);
if (choice == 0) return null;
if (choice < 1 || choice > validSteps.size()) {
if (choice < 1 || choice > steps.size()) {
println(styles.error("Invalid choice. Please select a number from the list."));
return null;
}

ReviewData.StepInfo targetStep = validSteps.get(validSteps.size() - choice);
ReviewData.StepInfo targetStep = steps.get(steps.size() - choice);

print("Reason for backtracking (optional): ");
String reason = readInput();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -120,11 +120,14 @@ private String renderNode(Node node, String nodeId, String indent, AnsiStyles st
"agent",
styles.bold(standardNode.getAgentId())));
}
if (standardNode.getRubricId() != null) {
if (standardNode.getRubric() != null) {
sb.append(
String.format(
"%s%s %-9s %s%n",
indent, styles.boxMid(), "rubric", standardNode.getRubricId()));
indent,
styles.boxMid(),
"rubric",
standardNode.getRubric().getCriteria().size() + " criteria"));
}
if (standardNode.getReviewConfig() != null) {
sb.append(
Expand Down Expand Up @@ -271,11 +274,14 @@ private String renderNode(Node node, String nodeId, String indent, AnsiStyles st
styles.accent(String.valueOf(genericNode.getConfig().size()))
+ " entries"));
}
if (genericNode.getRubricId() != null) {
if (genericNode.getRubric() != null) {
sb.append(
String.format(
"%s%s %-9s %s%n",
indent, styles.boxMid(), "rubric", genericNode.getRubricId()));
indent,
styles.boxMid(),
"rubric",
genericNode.getRubric().getCriteria().size() + " criteria"));
}
appendTransitions(sb, indent, genericNode.getTransitionRules(), styles);
}
Expand Down
Loading
Loading