A decoupled, agentic UX/UI framework. A pure-Python orchestrator that turns a niche or an existing website into a working React/Next.js + TailwindCSS frontend β by delegating the heavy lifting to console AI agents instead of doing it itself.
UX-Forge is the "Jefe de MΓ‘quinas" (foreman): it never browses the web and never writes frontend code with its own hands. It plans, delegates, captures results, judges them, and re-delegates corrections.
The core architectural bet: orchestration is a separate concern from execution. A small, robust, well-tested Python core coordinates two specialized CLI agents that already live on the host:
| Agent | Role | What it does |
|---|---|---|
| Goose | Researcher / Scraper | Browses the web, scrapes pages, extracts DOM/CSS/colors/typography |
| Aider | Coder | Generates and edits frontend files from a spec |
The orchestrator adds only the thin reasoning layer via direct LLM calls
(litellm): turning raw research into a design system, and judging the
generated code. Everything else is delegation.
This keeps the orchestrator:
- Model-agnostic β swap
ollama/qwen3-14b-32kforgpt-4owith one flag. - Hang-proof β every delegated process has a hard
asynciotimeout and is killed and reaped on overrun (so a stuck agent never holds the GPU). - Portable β paths come from
$UX_FORGE_HOME, so the same file runs on an EC2 host or a laptop.
The pipeline is strict-sequential and identical for both modes β only the prompts change. A failure in any phase aborts the run with a non-zero exit code (fail fast: never build on broken research).
Phase 1 BENCHMARKER β Goose β reports/benchmark.md
Phase 2 PLANNER β LLM β reports/ui_plan.md (design tokens + components)
Phase 3 EXECUTOR β Aider β output_app/ (React/Next.js + Tailwind)
Phase 4 EVALUATOR β LLMβAider β reports/ux_review.md (+ corrective edits)
- Benchmarker (Goose). Either researches a niche or scrapes a target URL. Writes a structured Markdown report.
- Planner (LLM). Reads the report and produces a Design System: a
## Design Tokensblock (color palette as named hex tokens, typography scale, spacing, radius), a## Component List, and## Implementation Notesmapping tokens to atailwind.config.jstheme. - Executor (Aider). Runs inside
output_app/and builds every component, the Tailwind config, and a composed sample page from the plan. - Evaluator (LLM β Aider). Concatenates the generated source into a bounded
digest, asks an LLM to compare it against the intended design system, and
emits
VERDICT: PASSorVERDICT: FIX. OnFIX, it extracts the correction block and re-invokes Aider for a surgical corrective pass.
UX-Forge selects its mode from the CLI args. --target and --url are mutually
exclusive (enforced at parse time); --modify only applies in clone mode.
Research the leading products in a niche and build a fresh, generic design system inspired by them.
python3 ux_agent.py --target "SaaS Finanzas"Goose researches the 3 leading apps in the niche β Planner synthesizes a generic design system β Aider builds it.
Scrape an exact URL, reconstruct its design system, then apply a requested modification on top β without losing the original identity.
python3 ux_agent.py --url "https://stripe.com" \
--modify "haz que los botones sean redondeados y el fondo oscuro"Goose produces a faithful UX/UI X-ray (DOM structure, CSS classes, exact hex
colors, typography) β Planner reconstructs the system and folds the
modification into the tokens (e.g. dark bg/surface, large border-radius)
β Aider builds the modified clone.
The modification is baked into the plan at Phase 2, not bolted onto the build prompt at Phase 3 β so Aider sees one coherent spec, never a contradictory "clone faithfully BUT change X" that weak models tend to drop.
Assumes aider and goose are already on $PATH.
cd /home/ubuntu/ux_forge
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env # edit if needed
set -a; source .env; set +a# Mode A β niche
python3 ux_agent.py --target "SaaS Finanzas"
# Mode B β clone + modify
python3 ux_agent.py --url "https://stripe.com" --modify "dark mode, rounded buttons"
# Use OpenAI instead of the local model
python3 ux_agent.py --target "SaaS Finanzas" --model gpt-4o
# Skip the review/corrective pass
python3 ux_agent.py --target "SaaS Finanzas" --no-evalpython3 ux_agent.py --help # arg parsing + mode mutex
python3 -c "import ux_agent; print('ok')"| Env var | Default | Purpose |
|---|---|---|
UX_FORGE_HOME |
/home/ubuntu/ux_forge |
Base working directory |
UX_FORGE_MODEL |
ollama/qwen3-14b-32k |
LLM for Planner & Evaluator |
OLLAMA_API_BASE |
http://localhost:11434 |
Ollama endpoint |
OPENAI_API_KEY |
β | Only if using --model gpt-4o |
Timeouts are fields on the Config dataclass (goose_timeout,
aider_timeout, llm_timeout, aider_fix_timeout). Agent invocation lives
in build_goose_cmd / build_aider_cmd β the single place to change agent
flags.
- Goose flag form.
build_goose_cmddefaults togoose run --text "...". Some Goose builds expect an instructions file (--instructions <path>). Checkgoose run --helpon your host. - Aider in an empty dir. Aider prefers an existing file/git repo to add to.
git initinsideoutput_app/(or pre-touch a placeholder) before Phase 3. - JS-heavy / bot-protected sites (e.g. Stripe, Cineplanet). Goose must render headless, not just fetch HTML, or the recon comes back with empty or invented colors. If clone recon returns fast with thin output, that's the tell.
- Buffered output.
run_clicaptures viacommunicate(), so a long phase shows no live output until the agent exits β silence β hung. - Single model on a T4. Planner and Evaluator each make one completion and don't overlap with Aider, so VRAM stays bounded. Don't run two pipelines at once on the same GPU.
ux_forge/
βββ ux_agent.py # the orchestrator (single-file core)
βββ requirements.txt # litellm only β the rest is stdlib
βββ .env.example # config template (no secrets)
βββ README.md
Logs are structured: [TIMESTAMP] [LEVEL] [ux-forge] [ACTION] message.
This framework was designed and built in a single working session as a zero-context greenfield, and the path there is worth recording because the process shaped the architecture.
-
The brief. Build an agentic UX/UI framework for an AWS EC2 (Ubuntu 22.04, pure Python 3.10+). Hard constraint: no heavy orchestration frameworks (no LangChain, no CrewAI) β only the standard library plus
litellm. The Python code must act purely as a foreman delegating to the pre-installedaiderandgooseCLIs. -
The first build (v1). Produced
ux_agent.pywith the 4-phase engine, a frozenConfigdataclass as the single source of truth, structured logging, hang-proofasynciosubprocess handling (kill + reap on timeout), and alitellmwrapper running blocking calls in a thread so the event-loop timeout machinery stays responsive. Shipped withrequirements.txt,.env.example, and a README. Smoke-tested:--helpand import clean. -
A governance detour. The host enforces a pre-commit-style hook that blocks the assistant from writing application code directly β the standing rule is "write a Unit spec and delegate to Aider, no self-exceptions." The hook fired and blocked
ux_agent.py. Rather than bypass it silently, the conflict was surfaced: this file is the orchestrator itself, not app code being patched mid-flow, and delegating a 400-line async framework to a 9B model on a T4 would produce far worse results. The user reviewed and explicitly authorized a scoped exception forux_forge/. Governance was respected, not circumvented. -
The refactor (v2): two modes. The framework was extended to support both Mode A (niche creation) and Mode B (clone + modify). The key design decision: branch only at the prompt-building layer (two small helper methods), keeping the pipeline driver completely mode-agnostic.
argparsegained a mutually-exclusive required group (--targetXOR--url) and a clone-only--modify, all validated at parse time. Every guard was smoke-tested. -
First real clone attempt. A run against
https://www.cineplanet.com.pe/appeared to "hang" after theSTARTlog. Diagnosis: not hung βrun_clibuffers output until the agent exits, so Phase 1 is silent by design while Goose scrapes. This surfaced the broader lesson about JS-heavy, bot-protected sites needing a true headless render. The findings are captured in the Known caveats above. -
Model routing reality. It became clear the local 9B model lacks the reasoning depth for greenfield generation of this size β reinforcing the framework's own model-routing philosophy: strong models for design and judgment, delegated agents for mechanical execution.
The throughline: a robust orchestrator is worth more than a clever monolith. Keep the coordinator small, typed, and hang-proof; push the messy, model-heavy work out to specialized agents behind hard timeouts.
MIT (or your preference).