UX-Forge 🛠️

A decoupled, agentic UX/UI framework. A pure-Python orchestrator that turns a niche or an existing website into a working React/Next.js + TailwindCSS frontend — by delegating the heavy lifting to console AI agents instead of doing it itself.

UX-Forge is the "Jefe de Máquinas" (foreman): it never browses the web and never writes frontend code with its own hands. It plans, delegates, captures results, judges them, and re-delegates corrections.

Why this design

The core architectural bet: orchestration is a separate concern from execution. A small, robust, well-tested Python core coordinates two specialized CLI agents that already live on the host:

Agent	Role	What it does
Goose	Researcher / Scraper	Browses the web, scrapes pages, extracts DOM/CSS/colors/typography
Aider	Coder	Generates and edits frontend files from a spec

The orchestrator adds only the thin reasoning layer via direct LLM calls (litellm): turning raw research into a design system, and judging the generated code. Everything else is delegation.

This keeps the orchestrator:

Model-agnostic — swap ollama/qwen3-14b-32k for gpt-4o with one flag.
Hang-proof — every delegated process has a hard asyncio timeout and is killed and reaped on overrun (so a stuck agent never holds the GPU).
Portable — paths come from $UX_FORGE_HOME, so the same file runs on an EC2 host or a laptop.

The 4-Phase Engine

The pipeline is strict-sequential and identical for both modes — only the prompts change. A failure in any phase aborts the run with a non-zero exit code (fail fast: never build on broken research).

Phase 1  BENCHMARKER  → Goose      → reports/benchmark.md
Phase 2  PLANNER      → LLM        → reports/ui_plan.md      (design tokens + components)
Phase 3  EXECUTOR     → Aider      → output_app/             (React/Next.js + Tailwind)
Phase 4  EVALUATOR    → LLM→Aider  → reports/ux_review.md    (+ corrective edits)

Benchmarker (Goose). Either researches a niche or scrapes a target URL. Writes a structured Markdown report.
Planner (LLM). Reads the report and produces a Design System: a ## Design Tokens block (color palette as named hex tokens, typography scale, spacing, radius), a ## Component List, and ## Implementation Notes mapping tokens to a tailwind.config.js theme.
Executor (Aider). Runs inside output_app/ and builds every component, the Tailwind config, and a composed sample page from the plan.
Evaluator (LLM → Aider). Concatenates the generated source into a bounded digest, asks an LLM to compare it against the intended design system, and emits VERDICT: PASS or VERDICT: FIX. On FIX, it extracts the correction block and re-invokes Aider for a surgical corrective pass.

Two Operating Modes

UX-Forge selects its mode from the CLI args. --target and --url are mutually exclusive (enforced at parse time); --modify only applies in clone mode.

Mode A — Niche Creation (Greenfield)

Research the leading products in a niche and build a fresh, generic design system inspired by them.

python3 ux_agent.py --target "SaaS Finanzas"

Goose researches the 3 leading apps in the niche → Planner synthesizes a generic design system → Aider builds it.

Mode B — Clone / Modify an Existing Site

Scrape an exact URL, reconstruct its design system, then apply a requested modification on top — without losing the original identity.

python3 ux_agent.py --url "https://stripe.com" \
    --modify "haz que los botones sean redondeados y el fondo oscuro"

Goose produces a faithful UX/UI X-ray (DOM structure, CSS classes, exact hex colors, typography) → Planner reconstructs the system and folds the modification into the tokens (e.g. dark bg/surface, large border-radius) → Aider builds the modified clone.

The modification is baked into the plan at Phase 2, not bolted onto the build prompt at Phase 3 — so Aider sees one coherent spec, never a contradictory "clone faithfully BUT change X" that weak models tend to drop.

Install (on EC2 or locally)

Assumes aider and goose are already on $PATH.

cd /home/ubuntu/ux_forge
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env          # edit if needed
set -a; source .env; set +a

Usage

# Mode A — niche
python3 ux_agent.py --target "SaaS Finanzas"

# Mode B — clone + modify
python3 ux_agent.py --url "https://stripe.com" --modify "dark mode, rounded buttons"

# Use OpenAI instead of the local model
python3 ux_agent.py --target "SaaS Finanzas" --model gpt-4o

# Skip the review/corrective pass
python3 ux_agent.py --target "SaaS Finanzas" --no-eval

Smoke test (no agents required)

python3 ux_agent.py --help              # arg parsing + mode mutex
python3 -c "import ux_agent; print('ok')"

Configuration

Env var	Default	Purpose
`UX_FORGE_HOME`	`/home/ubuntu/ux_forge`	Base working directory
`UX_FORGE_MODEL`	`ollama/qwen3-14b-32k`	LLM for Planner & Evaluator
`OLLAMA_API_BASE`	`http://localhost:11434`	Ollama endpoint
`OPENAI_API_KEY`	—	Only if using `--model gpt-4o`

Timeouts are fields on the Config dataclass (goose_timeout, aider_timeout, llm_timeout, aider_fix_timeout). Agent invocation lives in build_goose_cmd / build_aider_cmd — the single place to change agent flags.

Known caveats (verify before a real run)

Goose flag form. build_goose_cmd defaults to goose run --text "...". Some Goose builds expect an instructions file (--instructions <path>). Check goose run --help on your host.
Aider in an empty dir. Aider prefers an existing file/git repo to add to. git init inside output_app/ (or pre-touch a placeholder) before Phase 3.
JS-heavy / bot-protected sites (e.g. Stripe, Cineplanet). Goose must render headless, not just fetch HTML, or the recon comes back with empty or invented colors. If clone recon returns fast with thin output, that's the tell.
Buffered output. run_cli captures via communicate(), so a long phase shows no live output until the agent exits — silence ≠ hung.
Single model on a T4. Planner and Evaluator each make one completion and don't overlap with Aider, so VRAM stays bounded. Don't run two pipelines at once on the same GPU.

Project layout

ux_forge/
├── ux_agent.py        # the orchestrator (single-file core)
├── requirements.txt   # litellm only — the rest is stdlib
├── .env.example       # config template (no secrets)
└── README.md

Logs are structured: [TIMESTAMP] [LEVEL] [ux-forge] [ACTION] message.

Design Journey — how this was built

This framework was designed and built in a single working session as a zero-context greenfield, and the path there is worth recording because the process shaped the architecture.

The brief. Build an agentic UX/UI framework for an AWS EC2 (Ubuntu 22.04, pure Python 3.10+). Hard constraint: no heavy orchestration frameworks (no LangChain, no CrewAI) — only the standard library plus litellm. The Python code must act purely as a foreman delegating to the pre-installed aider and goose CLIs.
The first build (v1). Produced ux_agent.py with the 4-phase engine, a frozen Config dataclass as the single source of truth, structured logging, hang-proof asyncio subprocess handling (kill + reap on timeout), and a litellm wrapper running blocking calls in a thread so the event-loop timeout machinery stays responsive. Shipped with requirements.txt, .env.example, and a README. Smoke-tested: --help and import clean.
A governance detour. The host enforces a pre-commit-style hook that blocks the assistant from writing application code directly — the standing rule is "write a Unit spec and delegate to Aider, no self-exceptions." The hook fired and blocked ux_agent.py. Rather than bypass it silently, the conflict was surfaced: this file is the orchestrator itself, not app code being patched mid-flow, and delegating a 400-line async framework to a 9B model on a T4 would produce far worse results. The user reviewed and explicitly authorized a scoped exception for ux_forge/. Governance was respected, not circumvented.
The refactor (v2): two modes. The framework was extended to support both Mode A (niche creation) and Mode B (clone + modify). The key design decision: branch only at the prompt-building layer (two small helper methods), keeping the pipeline driver completely mode-agnostic. argparse gained a mutually-exclusive required group (--target XOR --url) and a clone-only --modify, all validated at parse time. Every guard was smoke-tested.
First real clone attempt. A run against https://www.cineplanet.com.pe/ appeared to "hang" after the START log. Diagnosis: not hung — run_cli buffers output until the agent exits, so Phase 1 is silent by design while Goose scrapes. This surfaced the broader lesson about JS-heavy, bot-protected sites needing a true headless render. The findings are captured in the Known caveats above.
Model routing reality. It became clear the local 9B model lacks the reasoning depth for greenfield generation of this size — reinforcing the framework's own model-routing philosophy: strong models for design and judgment, delegated agents for mechanical execution.

The throughline: a robust orchestrator is worth more than a clever monolith. Keep the coordinator small, typed, and hang-proof; push the messy, model-heavy work out to specialized agents behind hard timeouts.

License

MIT (or your preference).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UX-Forge 🛠️

Why this design

The 4-Phase Engine

Two Operating Modes

Mode A — Niche Creation (Greenfield)

Mode B — Clone / Modify an Existing Site

Install (on EC2 or locally)

Usage

Smoke test (no agents required)

Configuration

Known caveats (verify before a real run)

Project layout

Design Journey — how this was built

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
ux_agent.py		ux_agent.py

Folders and files

Latest commit

History

Repository files navigation

UX-Forge 🛠️

Why this design

The 4-Phase Engine

Two Operating Modes

Mode A — Niche Creation (Greenfield)

Mode B — Clone / Modify an Existing Site

Install (on EC2 or locally)

Usage

Smoke test (no agents required)

Configuration

Known caveats (verify before a real run)

Project layout

Design Journey — how this was built

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages