Skip to content

Add LLM proxy skills and API spec#31

Draft
jagadeshsid wants to merge 11 commits into
mulesoft:masterfrom
jagadeshsid:siddhartha/llm-gateway-skills
Draft

Add LLM proxy skills and API spec#31
jagadeshsid wants to merge 11 commits into
mulesoft:masterfrom
jagadeshsid:siddhartha/llm-gateway-skills

Conversation

@jagadeshsid

Copy link
Copy Markdown

Summary

Adds four JTBD skills covering the LLM Gateway workflows plus the private llm-proxy API spec (9 operations on /apimanager/xapi/v1).

Skills (4)

Skill Steps Purpose
create-llm-proxy-model-based-routing 8 Create an LLM proxy that routes by model field in the request body
create-llm-proxy-semantic-routing 10 Create an LLM proxy that routes by prompt semantics (Semantic Service Config + Global Prompt Topics)
apply-token-rate-limiting-to-llm-proxy 5 Apply the llm-token-rate-limit policy (groupId 68ef9520-…, assetId llm-token-rate-limit, version 1.0.0) to an existing LLM proxy
request-llm-proxy-access 5 Mint a client_id / client_secret and create a contract for a consumer

API spec

  • apis/llm-proxy/api.yaml — 9 operations: listLlmRouteConfigurations, listEnvironmentLlmProxies, createEnvironmentLlmProxy, getGatewayTargetApisByPortAndPath, listSemanticServiceConfigs, createSemanticServiceConfig, getSemanticServiceConfig, listGlobalPromptTopicsBySsc, createGlobalPromptTopic
  • apis/llm-proxy/exchange.jsongroupId: f1e97bc6-…anypoint-platform, assetId: llm-proxy-api, version: 1.0.0, visibility: private
  • apis/llm-proxy/examples/responses/ — 11 captured example responses (secrets redacted)

Validation

  • make validate-api API=apis/llm-proxy — basic OAS 0/0, governance 0/0
  • python3 scripts/build/validate_jtbd.py — all 4 skills PASS individually
  • python3 scripts/build/validate_xorigin.py apis/llm-proxy — no violations
  • make validate-descriptions — no violations
  • make generate-portal — portal renders cleanly; [oas] LLM Proxy API + 4 [agent-skill] entries in registry

Verification approach

Payload shapes, URLs, and response shapes are grounded in:

  1. Anypoint UI sourceapi-manager-ui-node-lib repo. Every URL, payload field, and enum value traces back to a specific file/line (e.g., AddLlmProxyWizard.js:596-736 for the LLM proxy POST shape, routeConfigs.js:21-276 for the per-provider field catalog, constants.js:703-706 for the approvalMethod enum).
  2. Live HAR captures from stgx.anypoint.mulesoft.com — one model-based proxy create flow and one semantic proxy create flow (with both basic and advanced SSCs), captured end-to-end and diffed against the skill claims.
  3. Live 200 round-trip through the gateway — invoked a deployed LLM proxy with valid credentials; confirmed every x-llm-proxy-* response header documented in the skills, plus the 401 / fallback / model-override behaviors.

Test plan

  • Reviewer can run make validate-api API=apis/llm-proxy locally and confirm 0 violations
  • Reviewer can run python3 scripts/build/validate_jtbd.py skills/<skill>/SKILL.md . for each of the four new skills
  • Reviewer can run make generate-portal and verify the LLM Proxy API spec page + four skill HTML pages render without errors
  • CLA: I will sign the Salesforce CLA on first PR submission as prompted

Notes for review

  • The repo's pre-existing run-agent-scan-and-view-results skill currently fails validate-jtbd (missing YAML frontmatter). That is unrelated to this PR and predates these changes.
  • The LLM proxy API is marked visibility: private consistent with api-portal-xapi (the only other private API in the repo). Private APIs render the spec into portal/apis/<name>/api.yaml but skip the standalone HTML detail page by design.
  • A standalone review document with full source citations and a "Verified by HAR" appendix is available on request — happy to share if useful for reviewers.

Made with Cursor

Adds four JTBD skills covering the LLM Gateway workflows (model-based
routing, semantic routing, token rate limiting, consumer access) and
the private llm-proxy API spec (9 operations on /apimanager/xapi/v1).

Skills:
- create-llm-proxy-model-based-routing  (8 steps)
- create-llm-proxy-semantic-routing     (10 steps)
- apply-token-rate-limiting-to-llm-proxy (5 steps)
- request-llm-proxy-access              (5 steps)

API: apis/llm-proxy/ (api.yaml, exchange.json, 11 example responses)

Payload shapes and URLs are grounded in the Anypoint UI source
(api-manager-ui-node-lib) and live HAR captures from stgx, including
a successful 200 invocation through the gateway. All four validators
pass: validate-jtbd, validate-api, validate-xorigin,
validate-descriptions; make generate-portal renders the spec and
all four skill pages cleanly.

Made-with: Cursor
@jagadeshsid jagadeshsid requested a review from a team as a code owner April 27, 2026 12:44
@jagadeshsid jagadeshsid changed the title Add LLM Gateway skills and llm-proxy API spec Add LLM proxy skills and API spec Apr 27, 2026
Three alignment fixes after a deep validator + cross-reference pass:

1. Convert input refs to canonical `from: { variable: X }` form
   The first cut used `from: { step: <name>, output|input: X }`, an
   evolved schema documented in the (now-archived) anypoint-public-api-specs
   repo. mulesoft-dx's validator + job-template + every existing skill use
   the simpler `from: { variable: X }`. Reverted 61 references across all
   four LLM skills; validators now run cleanly with zero warnings (was
   19+26+8+8 warnings on the prior commit).

2. Wire up the missing example response in api.yaml
   `gatewayTargetApisOccupied.json` was committed but not referenced from
   any operation. Switched the corresponding `getGatewayTargetApisByPortAndPath`
   200 response to OAS 3.0 `examples:` (plural) with named entries:
   `Available` (the existing case) and `Occupied` (the conflict case).
   Both cases now show up in generated portal docs.

3. Remove broken cross-skill reference in `request-llm-proxy-access`
   The Related Jobs section linked `manage-consumer-contracts`, which
   exists in older internal repos but was never migrated to mulesoft-dx.
   Dropped the bullet rather than introduce a dead link.

Also adds `apis/llm-proxy/examples/responses/README.md` cataloguing every
example file: which `api.yaml` operation references it, and the two
supplementary references that document responses from other API specs
(used as concrete shapes by the LLM Gateway skills).

Validators all green: validate-jtbd (per skill, 0 warnings + 0 errors),
validate-api (basic OAS 0/0, governance 0/0), validate-xorigin (clean),
validate-descriptions (clean).

Made-with: Cursor
@jagadeshsid jagadeshsid marked this pull request as draft April 27, 2026 13:15
Walked the four skills end-to-end as if I were the agent and identified
specific spots where the agent would have to guess or where it could miss
a step. Eight surgical edits across the four skills:

Skill 1 (model-based routing):
- Step 6: when the user hasn't specified an inbound `properties.platform`,
  default to `openai` (universal client format, supports all five
  providers via transcoding). Pick `gemini` only if the user explicitly
  wants Gemini-format consumer bodies (single-route only).
- Step 7: the `fallbackRoute` value MUST match a route's `label` exactly.
  Server silently accepts mismatched values, but at runtime fallback
  never triggers — explicit warning so the agent picks from the labels
  it just constructed instead of asking the user for free text.
- Step 7: capture `routing[*].rules.headers.x-routing-header` as a
  `modelPrefixes` output, and surface those values to the user in
  "What happens next" so they know which `<prefix>/<model>` strings are
  valid in the request body.
- Tips: add a mixed-mode credentials note — modes are per-upstream, so a
  proxy can have OpenAI on a static key AND Gemini on DataWeave at the
  same time. Avoids the "pick one mode for the whole proxy" assumption.

Skill 2 (semantic routing):
- Step 4: utterance-elicitation guidance. When the user hasn't supplied
  utterances, the agent should prompt per-topic for 5–10 diverse phrasings,
  refuse fewer than 5, and challenge utterances that overlap multiple topics.

Skill 3 (token rate limiting):
- Step 5: post-apply verification. Make a test call after ~30s and
  confirm the `x-llm-proxy-ratelimit` header is present in the 200
  response. Header absence after 90s suggests the policy didn't apply.

Skill 4 (request access):
- Step 5: SLA-tier discovery. Before posting the contract, list tiers
  via `listOrganizationsEnvironmentsApisTiers` to detect tiered proxies
  proactively rather than recovering from a 400.
- Invocation Next Steps: lead with `/chat/completions` as the universal
  subpath and document that `/responses` is OpenAI-only — the
  transcoding policies for non-OpenAI providers don't currently
  translate `/responses` semantics.
- Invocation Next Steps: add explicit guidance on how to find a proxy's
  valid `model` prefixes — read `routing[*].rules.headers.x-routing-header`
  on the existing proxy and construct `<prefix>/<model>`.

All four skills still pass `validate-jtbd` per-skill with zero warnings.
`make validate-api` clean. `make validate-descriptions` clean.

Made-with: Cursor
Adds descriptions to the few inner-array `items:` and the route-rule
`headers` map that previously inherited context from their parents.
Surfaced by an end-to-end audit pass (every property under every
request body now has a description).

Made-with: Cursor
Addresses lead review feedback that the combined create-llm-proxy-
semantic-routing skill conflates the basic and advanced flows. The two
flows differ enough at the topic-create endpoint, the SSC payload, the
deployment story, and the post-create policy attach that one skill was
forcing branchy logic on the agent.

Changes:

- Add skills/create-llm-proxy-semantic-routing-basic/SKILL.md (10 steps).
  Uses /prompt-topics, no vector DB, documents the platform's auto-rebind
  behavior and the agent's need to read back the resolved SSC, no manual
  policy attach (basic variants auto-attach).

- Add skills/create-llm-proxy-semantic-routing-advanced/SKILL.md (12 steps).
  Uses /global-prompt-topics, requires a vector DB, includes a new step
  for fetching and running the GET /semantic-setup-script that hydrates
  the vector DB, plus the manual semantic-routing-policy-* apply step
  (advanced variants are not auto-attached).

- Remove skills/create-llm-proxy-semantic-routing/.

- Add createPromptTopic and getSemanticSetupScript operations to
  apis/llm-proxy/api.yaml plus a captured promptTopic.json example
  (live-tested against stgx).

- Both new skills:
  - Restructure topic + utterance gathering as inline-first, with a
    CSV/JSON file-path fallback for larger sets (the agent parses the
    file itself).
  - Consolidate ask-the-user inputs into the Prerequisites section so
    the agent reads it as a checklist and prompts up front.

- Skill 1 (model-based) gets the same Prerequisites consolidation.

- Update cross-references in Skill 1, Skill 3, Skill 4 to point at the
  two new skill names.

All five validators pass clean. Cross-ref audit: 29 output JSONPaths
across the 5 skills resolve in their captured example responses.

Made-with: Cursor
Addresses lead review feedback that the create skills should be
config-driven, with explicit verification curls per creation step and
a final end-to-end test step.

Common changes (all three create skills):

- New "Read llmproxy.yaml first" section at the top of each skill
  describing a recommended input file alongside .env. The skill text
  tells the agent to be flexible about field names and structure since
  customer YAMLs may vary; per-step lookup tables specify which YAML
  fields short-circuit which steps.
- Conditional skip notices on Steps 1 + 2 (org/env), Step 4 (gateway
  target), Step 5 (port + base path defaults), based on llmproxy.yaml
  contents.
- Asset publish broken into three explicit phases — POST, poll the
  publicationStatusLink, verify attributes — each with its own curl
  block, per the lead's "polling and verifying are two separate steps".
- New verification curl after the proxy create (Step 7 in model-based,
  Step 9 in semantic-basic, Step 10 in semantic-advanced) so customers
  can check the API instance metadata before the deployment poll.
- New "Test the Proxy with a real request" section at the end of each
  create skill, with concrete curls for happy-path + fallback cases
  and triage guidance per response shape. References
  request-llm-proxy-access for credential creation.
- Soften apiVersionStatus first-poll claim — agent E2E test on stgx
  showed the field is null on the very first poll (~3s in), then
  unregistered, then active. Gate is unchanged ("== active") but the
  explanatory wording is corrected.

Skill 2 (semantic-basic) specific:
- Replaced the "Trade-offs vs advanced" comparison table with a
  smaller "Basic-mode limits" callout — keeps the limits visible
  without confusing readers who came here for basic only.
- Added embedding service provider name to Prerequisites alongside
  key + model.
- Step 3 (list SSCs), Step 4 (create SSC) marked conditional on
  routing.ssc.id being set in YAML.
- Topic + utterance source priority now lists llmproxy.yaml inline,
  llmproxy.yaml topics_csv/topics_json paths, then in-chat inline,
  then file path the user names in chat.

Skill 3 (semantic-advanced) specific:
- Steps 3, 4, 5 all conditional on routing.ssc.id (reuse existing SSC
  scenario).
- Step 6 (getSemanticSetupScript) kept and emphasized — verified live
  on stgx that this is a separate API call from the SSC create
  response, and that without running the script the vector DB stays
  empty and every prompt routes to fallback.

Validation:
- All 5 JTBD validators PASS (0 warnings, 0 errors each)
- validate-api / validate-xorigin / validate-descriptions clean
- Cross-ref: 29/29 output JSONPaths resolve in their captured examples
- Live agent E2E test against the revised model-based skill succeeded
  end-to-end on stgx in ~30s with zero outside lookups, zero hard gaps,
  and zero ambiguities (only one wording fix applied, included above).

Made-with: Cursor
Five comments surfaced when an agent ran the semantic-basic skill
interactively (without a pre-baked llmproxy.yaml). All five are
addressed; one of them required removing a previously-documented
manual step that the platform has since started auto-attaching.

Comment 1 — Routes elicitation should be explicit and unbounded.
- All three create skills: the Step that generates the proxy POST
  (or, for semantic skills, the Step that creates topics) now opens
  with a "Routes — elicit these from the user" sub-section. The
  prose explicitly says don't assume a fixed number of routes (not
  two, not any specific count), don't assume which providers, and
  don't assume any particular credential mode. Per route the agent
  asks: provider (any subset of the catalog), target model, key mode
  (static or DataWeave), label.
- For semantic skills, also calls out that multiple topics CAN map
  to the same route (many-to-one is normal — Finance + Investing
  both routing to the same OpenAI route, etc.). The validation
  checklist now includes "every topic's route_label matches one of
  the route labels you defined above".

Comment 2 — Routing policy attach is automatic for advanced too
(verified live on stgx 2026-04-30 — proxy 4696396 has
semantic-routing-policy-openai-qdrant auto-attached).
- Advanced skill: removed the manual policy-attach Step 11
  entirely. Renumbered the deployment poll from Step 12 to Step 11.
  Updated the overview, the API endpoints table, the URL gotchas
  list, the completion checklist, and the troubleshooting prose
  to reflect that the platform now auto-attaches the
  semantic-routing-policy-<provider>-<vectordb> variant. Net
  effect: 12 steps → 11 steps. Skill is shorter and matches
  current platform behavior.

Comment 3 — Step 3 should filter by serviceType + ASK the user.
- semantic-basic Step 3: explicit "filter to serviceType: basic
  client-side" instruction + an explicit "ASK the user whether to
  use one of these or create a new one" prompt. Don't silently
  default. The auto-rebind quirk caveat remains.
- semantic-advanced Step 3: same pattern, filter to advanced.

Comment 4 — Don't auto-select the Flex Gateway target.
- All three create skills' Step 4/6/7 (gateway target listing) now
  end with "Surface the eligible targets to the user and ASK them
  to pick one. Do NOT auto-select even if there's only one
  eligible candidate or one with a 'right-looking' name."

Comment 5 — Client app create body shape — be strict, surface the
common failure modes.
- request-llm-proxy-access Step 4: new "Critical — request body
  shape" callout. Lists exactly the two-field body shape that
  works (verified live 2026-04-30: `{name, description}` returns
  201 with clientId + clientSecret), and explicitly enumerates
  fields the agent should NOT include (applicationId, clientId,
  clientSecret as null inputs; targetApiInstanceId; redirectUri,
  homepageUrl) including the specific 400 message that confused
  agents in a previous run.

Validation:
- All 5 JTBD validators PASS (0 warnings, 0 errors each)
- validate-api / validate-xorigin / validate-descriptions clean
- Portal generates clean

Made-with: Cursor
Five subagents tested model-based×{YAML,inline}, semantic-basic×{YAML,inline},
and apply-token-rate-limiting on stgx in parallel. All five completed
end-to-end (cleanup included). Aggregate findings drove the changes below.

Hard skill errors corrected:

- **Model-based routing supports at most ONE route per provider.** The
  agent that elicited 3 routes (testing the "you can have 1, 3, 5+
  routes" claim) had its third route silently deduped by the server —
  upstreams are merged on (provider, uri, label), and the routing
  policy keys off the resolved provider, not the route's
  x-routing-header value. Skill 1's Step 7 elicitation now spells out
  the constraint: max 5 routes (one per supported provider), with an
  explicit "reject duplicate provider" guard during elicitation.
- **Upstream `label` MUST equal the provider identifier.** Skill 1
  never said this; the agent hit `400 BadRequestError: "Upstream in
  route 'X' has provider 'openai' which requires upstream label to be
  'openai', but found label 'openai-full'"` on first try. Now stated
  explicitly with the exact 400 message included.
- **`endpointUri` is often comma-separated** (`<cloudhub-url>,<custom>`).
  All three create skills' "Test the Proxy" sections now warn about
  this and document the recovery (split on `,` and pick first, OR fetch
  canonical URL from `gateway-targets/{id}.configuration.ingress.publicUrl`).
- **Deploy-failed-with-no-error recovery** (target-side flake — even
  with `RUNNING+ready=true`). All three create skills' deployment-poll
  Common Issues now document the full-instance PATCH-to-different-target
  recovery lever, plus a tip that long polls should tolerate transient
  curl errors (`--max-time 20 --retry 3 --retry-delay 2`).
- **apply-token-rate-limiting was missing Cleanup + verification curl.**
  Added a Cleanup section (the policy-DELETE pattern), and a
  copy-pasteable verification curl with `jq` filter showing the policy
  with its applied configuration. Also: corrected the catalog-expansion
  magnitudes (~101→629 verified live, was claimed ~130→1200), and
  softened the `assetVersion: "1.0.0"` "Latest stable" wording since
  `1.0.1` is also published.

Ambiguities clarified:

- **`approvalMethod` on Skill 1 Step 7** — was hard-coded `value: null`
  in the YAML block but the lookup table at the top says it's
  YAML-driven. Now consistently YAML-driven with `null` as the default.
- **Auto-rebind, send `null` on every call** — Skill 2 Step 5 now
  explicitly says "do NOT propagate topic-1's resolved SSC into
  topic-2's request body; send `null` on every prompt-topic POST and
  capture the response value each time."
- **`embedding.*` block in YAML when Step 4 is skipped** — Skill 2
  Step 5 now flags that the YAML's embedding config is silently
  disregarded if the env already has basic SSCs (auto-rebind picks
  the env default regardless), so the agent surfaces this to the user.
- **Threshold inclusivity** — both semantic skills now say the
  comparison is "strictly greater than" the threshold (a score of
  exactly the threshold falls back). Verified: score 0.50 with
  threshold 0.50 falls back.
- **`.env` orgId reconciliation** — Skill 1 Step 1 now notes that
  `.env` orgIds are commonly stale; trust `listMe` over `.env` and
  surface mismatches.

Validation:
- All 5 JTBD validators PASS (0 warnings, 0 errors)
- validate-api / validate-xorigin / validate-descriptions clean
- Portal regenerates clean
- 5 parallel agent E2E runs on stgx all completed (resources cleaned up)

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant