GitHub - ramn51/titan-orchestrator: A zero-dependency distributed orchestrator built from scratch. Bridges static DevOps pipelines and dynamic Agentic AI workflows via a Python SDK & CLI. Features reactive auto-scaling and capability-based routing

A distributed execution runtime built from first principles — custom TCP protocol, DAG scheduler, AOF-backed store, and agentic runtime in a single zero-dependency JAR.

5-Minute Quickstart · Full Documentation · How Titan Compares · Architecture

Titan is a distributed execution runtime with a custom DAG scheduler, binary wire protocol, AOF-backed KV store, and agentic execution engine. The core engine is a single JAR with zero external dependencies — it schedules, routes, and executes across a cluster with nothing else installed. The dashboard (Flask) and MCP server (mcp package) are optional extensions on top. It runs on a bare VM or laptop.

Submit jobs in YAML or Python. Titan resolves dependencies, routes to capable workers, streams logs back, and recovers from crashes via AOF replay. At the agentic tier, tasks can spawn new tasks mid-execution, share state across nodes through TitanStore, and pause for human approval before continuing.

It covers three capability tiers in one binary:

Tier	What you run
T1 — Distributed Task Scheduler	Batch jobs, static DAGs, GPU/CPU-routed workloads, delayed execution
T2 — Service Orchestrator	Long-running APIs and daemons with auto-restart and port management
T3 — Agentic Runtime	Self-mutating DAGs, LLM-driven agents, multi-agent pipelines with HITL gates

v1.0 research status. Single-Master topology (Raft on the v2 roadmap), process-level isolation (Docker on v2), no mTLS yet. Built to be understood, not to replace Kubernetes or Temporal in production today.

Dashboard

Titan ships with a built-in Python Flask dashboard. Three views:

Orchestrator — Cluster health and worker load

Real-time view of every connected worker — capability tag (GENERAL / GPU / HIGH_MEM), active job count, running services, and recent activity. Includes a + Launch Worker button to spin up new nodes from the browser.

DAG Pipelines — Live dependency graph

Every pipeline submitted to the cluster — via CLI, SDK, YAML, or the visual Constructor — is automatically rendered as a live dependency graph. Node colours update in real-time as jobs move through PENDING → RUNNING → COMPLETED / FAILED. Click any node to stream stdout/stderr live.

DAG Constructor — Visual pipeline builder

Browser-based drag-and-drop DAG editor. Add task and service nodes, draw dependency edges, configure script, capability, priority, and HITL gates per node — then deploy directly to the cluster in one click. Auto-generates equivalent Python SDK and YAML as you build.

Agent Runs — Multi-stage agent timeline

Groups all DAG stages that share an agent_run_id into a single timeline row. When an agent runs iteratively (PLAN → ITER → EVAL → SYNTH), each stage is a separate DAG submission — Agent Runs reconstructs the full lifecycle so you don't have to hunt through individual entries.

Four Ways to Define a Pipeline

Method	Best for
YAML file	Repeatable, version-controlled pipelines. Commit to git and re-run any time.
Python SDK	Programmatic pipelines where shape is determined at runtime — agent loops, dynamic fan-out.
Visual Constructor	Building pipelines without writing code. Drag nodes, draw edges, deploy in one click.
MCP (natural language)	Controlling Titan from Claude Desktop or Cursor. Describe what you want — the agent writes the scripts and submits the DAG on your behalf.

All four paths produce the same result: a tracked DAG in the visualizer with per-job logs, status, and workspace files.

Demos

Visual DAG Constructor

Build a pipeline by dragging nodes and drawing edges — deploy to the cluster with one click.

Screen.Recording.2026-05-17.at.10.23.48.PM.mov

Human-in-the-Loop (HITL) Gate

A DAG pauses at a checkpoint and waits for a human Approve/Reject before downstream jobs resume.

Screen.Recording.2026-05-17.at.10.25.48.PM.mov

HITL on a Complex Graph

HITL gate mid-execution on a multi-branch pipeline — shows how the visualizer reflects the paused state.

Screen.Recording.2026-05-17.at.10.29.38.PM.mov

Agentic AI Workflow

A multi-stage agent loop — each stage is a separate DAG submission, grouped into one timeline in Agent Runs.

Screen.Recording.2026-05-17.at.11.22.29.PM.mov

More: Dynamic DAG Execution, Reactive Scaling, GPU Routing, Fanout

Control Plane: Dynamic DAG Execution

dynamic_dag.mp4

Reactive Worker Scaling

titan_load_scaling.mp4

GPU Affinity Routing

GPU_Affinity_yaml.mp4

Parallel Execution (Fanout)

fanout_yaml_dag.mp4

Full Load Cycle (Scale Up & Descale)

titan_load_descaling.mp4

Quick Start

# 1. Build the engine
mvn clean package -DskipTests

# 2. Start the cluster (Master + 2 workers + TitanStore + dashboard)
./titan-dev.sh up

# 3. Open the dashboard
open http://localhost:5000

Submit your first job:

from titan_sdk.titan_sdk import TitanClient, TitanJob

client = TitanClient()
client.submit_dag("hello-world", [
    TitanJob(job_id="hello", script_content="print('hello from Titan')")
])

Full walkthrough: 5-Minute Quickstart

Architecture

Three components:

Control Plane (Master) — DAG scheduling, dependency resolution, capability routing, AOF-backed state recovery
Workers — Capability-tagged execution nodes that self-register on startup and re-register after restart
TitanStore (optional) — AOF-backed KV store for crash recovery and cross-job shared agent state. No external database required.

The wire protocol (TITAN_PROTO) is a custom fixed-header binary format over raw TCP — no JSON on the dispatch path.

Architecture Deep Dive

MCP — Control Titan from Claude Desktop

Titan ships a built-in MCP server. Connect Claude Desktop or Cursor and control your cluster in natural language:

"Research three approaches to distributed ML scheduling. Analyze gaps, methodology, and open problems in parallel. Synthesize into a report."

Titan executes the parallel jobs, fans results into a synthesis job, and returns the output — all from a single prompt. No terminal, no code.

MCP Setup and Use Cases

Key Features

Execution

Static DAGs via YAML or Python SDK
Dynamic DAG mutation at runtime — tasks can spawn new tasks mid-execution
Long-running services alongside batch jobs in one runtime

Routing & Scaling

Capability-based routing — tag workers GPU, HIGH_MEM, or custom; jobs are held until a matching node is free
Affinity routing — pin jobs to specific workers by tag
Least-connection dispatch across available workers
Reactive auto-scaling — when a worker's queue saturates, it spawns child worker processes on the same machine to absorb the spike; idle burst workers decommission automatically after 45 seconds

Worker Lifecycle

Permanent vs. ephemeral workers — mark nodes as permanent so they stay alive across job completions; burst workers exit when idle
Workers re-register with the Master on reconnect — the cluster heals through a Master restart without manual intervention
Orphan cleanup on startup — workers scan for and kill leftover processes from a previous crash before accepting new work

Resilience

AOF crash recovery — Master replays state on restart, resumes in-flight DAGs
Worker re-registration — cluster recovers through a Master restart without manual intervention
Callback retry with exponential backoff

Observability

Live DAG visualizer with per-node status
Real-time log streaming from any job
Agent Runs timeline for multi-stage agent workflows

Human-in-the-Loop

Native HITL gates — pause a DAG at any checkpoint, Approve/Reject from the dashboard
Configurable timeout (default 48 hours)
Automatic gate injection via SDK

Cloud Deployment

Titan runs locally out of the box. When you're ready to move to the cloud, package_cloud.sh builds two deployment bundles:

./package_cloud.sh
# → titan-master-bundle.zip   (~2.3 MB)  — everything needed on the Master VM
# → titan-worker-bundle.zip   (~120 KB)  — Worker.jar + titan_sdk for remote workers

Multi-VM Setup — permanent cluster on GCP / AWS / Azure
Remote GPU via SSH Tunnel — keep your local machine as Master, tunnel a RunPod or cloud VM as a worker with no open ports

Examples

Example	What it shows
Build Your First Agent	Writer → Critic loop — simplest agentic pattern in ~60 lines
Human-in-the-Loop Pipeline	ML pipeline that pauses for human Approve/Reject before training
Multi-Agent Research Pipeline	Parallel agents + HITL gate + synthesis fan-in
Static YAML Pipelines	Diamond patterns, GPU routing, parallel fan-out
LangChain Integration	Wrap the Titan SDK as LangChain tools — no MCP needed

Contributing

Titan is an experimental runtime built from first principles by a single developer. Bug reports, edge case findings, and contributions are encouraged.

Report an Issue · How to Contribute

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.github/DISCUSSION_TEMPLATE		.github/DISCUSSION_TEMPLATE
examples		examples
perm_files		perm_files
screenshots		screenshots
src		src
titan-docs		titan-docs
titan_sdk		titan_sdk
titan_test_suite/examples		titan_test_suite/examples
uploads		uploads
.dag_constructor_states.json		.dag_constructor_states.json
.gitattributes		.gitattributes
.gitignore		.gitignore
DistributedOrchestrator.iml		DistributedOrchestrator.iml
LICENSE		LICENSE
README.md		README.md
config.properties		config.properties
database6379.aof		database6379.aof
package_cloud.sh		package_cloud.sh
pom.xml		pom.xml
pytest.ini		pytest.ini
setup.py		setup.py
titan-dev.sh		titan-dev.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dashboard

Orchestrator — Cluster health and worker load

DAG Pipelines — Live dependency graph

DAG Constructor — Visual pipeline builder

Agent Runs — Multi-stage agent timeline

Four Ways to Define a Pipeline

Demos

Visual DAG Constructor

Human-in-the-Loop (HITL) Gate

HITL on a Complex Graph

Agentic AI Workflow

Quick Start

Architecture

MCP — Control Titan from Claude Desktop

Key Features

Cloud Deployment

Examples

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dashboard

Orchestrator — Cluster health and worker load

DAG Pipelines — Live dependency graph

DAG Constructor — Visual pipeline builder

Agent Runs — Multi-stage agent timeline

Four Ways to Define a Pipeline

Demos

Visual DAG Constructor

Human-in-the-Loop (HITL) Gate

HITL on a Complex Graph

Agentic AI Workflow

Quick Start

Architecture

MCP — Control Titan from Claude Desktop

Key Features

Cloud Deployment

Examples

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages