Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 53 additions & 101 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,131 +1,83 @@
# AGENTS.md

This file is principles — the contracts that stay true while implementations
churn. For how to actually run, boot, share, or navigate things today
(fresh-worktree setup, dev servers, ports, environment gotchas), see
[RUNNING.md](RUNNING.md) (which may lag reality; it says so itself). For
writing e2e scenarios, see [e2e/AGENTS.md](e2e/AGENTS.md). Run
`bun run bootstrap` first in any fresh checkout or worktree.

## Task Completion Requirements

- Use Effect Vitest for tests.
- Run targeted tests with `vitest run ...` when working on a scoped area.
- The root/package `bun run test` scripts are allowed because they delegate to
Vitest.
- NEVER run `bun test`.
- For code changes, run the narrowest useful verification before handing back.
- For broad or merge-ready changes, the full gates are `bun run format:check`,
`bun run lint`, `bun run typecheck`, and `bun run test`.
- Always run `bun run format` before opening a PR so the diff lands
already-formatted. Only commit formatting changes to files your branch
actually touches: if `format` rewrites unrelated pre-existing files, leave
those out of the PR (stage just your files).
Principles: the contracts that stay true while implementations churn. For how to
run, boot, share, or navigate things today, see [RUNNING.md](RUNNING.md); for
e2e scenarios, [e2e/AGENTS.md](e2e/AGENTS.md). Run `bun run bootstrap` first in
any fresh checkout or worktree.

## Task Completion

- Tests use Effect Vitest. Run scoped tests with `vitest run ...`. `bun run test`
is fine (it delegates to Vitest); NEVER run `bun test`.
- Run the narrowest useful verification for a change. For broad or merge-ready
work the full gates are `bun run format:check`, `lint`, `typecheck`, `test`.
- Run `bun run format` before opening a PR, staging only files your branch
touched (leave unrelated files `format` rewrites out of the PR).

## Handing Back Work: Evidence, Not Assertions

Comment on lines 17 to 18

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Abbreviated commands lose bun run prefix

The new line lists the full gates as bun run format:check, lint, typecheck, test — only the first command keeps its bun run prefix. Since this file is parsed by AI agents as executable instructions, an agent following this literally may run lint, typecheck, or test as bare shell commands instead of bun run lint, bun run typecheck, bun run test. The original spelled each one out in full to prevent exactly this ambiguity.

Suggested change
## Handing Back Work: Evidence, Not Assertions
work the full gates are `bun run format:check`, `bun run lint`, `bun run typecheck`, `bun run test`.

Context Used: AGENTS.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

"Done" is something the user can open, not a claim. When work changes what a
user sees or touches, the handoff has three parts, delivered unprompted:

1. **Watch it** — an e2e scenario covers the change, and the handoff links
directly to the specific run(s) that prove it, with one line each on what
to look at. Never hand back a bare wall of green results: the user's
question is "show me the new thing working," not "is everything healthy?"
2. **Touch it** — leave the session's dev server running and reachable over
the user's tailnet, with credentials, so they can take over and poke at
it. The instance you already booted for e2e IS this — leave it up rather
than standing up something separate.
3. **What to try** — name the paths worth exercising by hand, especially
ones no scenario pins yet. Honesty about coverage gaps is part of the
handoff. A human driving a real browser from another device reaches
states the test harness structurally cannot; invite that.

The same machinery runs in reverse: you can seed an environment INTO a
state — reproduce a bug live, stage data for the user to take over, set up
a walkthrough — and hand across the link. "Here's the broken state, live"
beats a paragraph describing it.

If no scenario covers the change yet, that is the cue to write one. When a
change is user-visible, embed the run's recording in the PR description —
reviewers should see the change, not just read about it.

Don't memorize the mechanics (ports, viewer, sharing commands) — discover
them from RUNNING.md and the code; they change.
"Done" is something the user can open, not a claim. When work changes what a user
sees, hand back: an e2e run that proves it (link the specific run with one line
on what to look at, not a wall of green), the dev server left running so they can
poke at it, and the paths worth trying by hand including ones no scenario covers
yet. If no scenario covers the change, write one, and embed its recording in the
PR. The machinery runs in reverse too: seed an environment into a state
(reproduce a bug live, stage data) and hand across the link.

## Service Emulators

When a test or demo needs an upstream API, OAuth/OIDC provider, or webhook
source, use the `@executor-js/emulate` emulators (GitHub, Google, Stripe,
Resend, WorkOS, and a dozen more) instead of writing a stub. They are
wire-level and stateful — real SDKs run against them unmodified — and each
serves a full OpenAPI spec ready for addSpec, mints real-shaped credentials,
runs working OAuth flows, and records every call in a request ledger you can
assert against. Hosted instances exist at `https://<service>.emulators.dev`
with zero setup. See the `emulate` skill
(`.claude/skills/emulate/SKILL.md`) for the control-plane reference and
recipes.

The emulators are a standalone project (`github.com/UsefulSoftwareCo/emulate`),
not vendored here — this repo only consumes the published `@executor-js/emulate`
package. You have full autonomy to change, publish, and deploy the emulators,
working directly on their `main`; the skill covers the loop. Don't re-introduce
a `vendor/` submodule for them.
For any test or demo needing an upstream API, OAuth/OIDC provider, or webhook
source, use the `@executor-js/emulate` emulators instead of writing a stub:
wire-level and stateful, real SDKs run unmodified, each serves a full OpenAPI
spec, mints real-shaped credentials, and records every call in a request ledger
to assert against. Hosted at `https://<service>.emulators.dev`. See the `emulate`
skill (`.claude/skills/emulate/SKILL.md`). They are a standalone project
(`github.com/UsefulSoftwareCo/emulate`) consumed here as the published package:
full autonomy to change, publish, and deploy them on their `main`; don't re-vendor.

## Attribution

Do not add any AI assistant, Claude, Anthropic, or Co-Authored-By
attribution/trailers to commits, commit messages, PRs, or generated files.

Pull request titles and descriptions are going to a public GitHub repo, so
avoid using specific names or internal info unless explicitly stated to.
No AI/Claude/Anthropic/Co-Authored-By attribution in commits, messages, PRs, or
generated files. PR titles and descriptions go to a public repo: no internal info
or specific names unless explicitly stated.

## Collaboration Notes

The user uses speech to text occasionally, so if sentences are weird or words
are not right, infer the likely intent and ask only when needed.

Code is very cheap to write. Do not give time estimates; with agents, code is
practically instant to generate. Unless stated otherwise, time to implement is
not a blocker.

Never use em-dashes (the `—` character) anywhere: prose, docs, code comments,
commit messages, or PRs. Use commas, colons, parentheses, or separate sentences
instead.
- The user uses speech-to-text; infer likely intent from odd wording, ask only
when needed.
- Code is cheap to write: no time estimates, implementation time isn't a blocker.
- Never use em-dashes anywhere. Use commas, colons, parentheses, or separate
sentences.

## Reference Repos

Repos in `.reference`, such as Effect and effect-atom, are available for
patterns. If given a Git URL for reference, clone it into `.reference` and
inspect it there. Make sure to pull the latest changes from the reference repo
before using it.
Repos in `.reference` (Effect, effect-atom, …) are available for patterns. Clone
a given Git URL into `.reference` and pull latest before using it.

## Engineering Priorities

- Prefer correctness and predictable behavior over short-term convenience.
- Preserve runtime behavior when changing lint, typing, or test structure.
- Keep package boundaries clear; use public package exports instead of relative
imports across package roots.
- Extract shared logic only when the shared behavior is real and local patterns
support it. Avoid broad generic abstractions for one-off duplication.
- Keep package boundaries clear; use public package exports, not cross-package
relative imports.
- Extract shared logic only when the shared behavior is real; avoid broad generic
abstractions for one-off duplication.

## Package Roles

- `packages/core/sdk`: executor core contracts, plugin wiring, scopes, sources,
secrets, policies, and test fixtures. The `@executor-js/sdk/http-auth`
subpath carries the shared placements-based auth-method vocabulary the HTTP
protocol plugins compose (core itself never imports it — composition, not
location, keeps core carrier-agnostic).
- `packages/core/storage-*`: storage adapters and storage test support.
- `packages/plugins/*`: protocol and provider plugins. Plugin-specific
runtime, React, API, and testing helpers should live with the owning plugin.
- `packages/core/sdk`: core contracts, plugin wiring, scopes, sources, secrets,
policies, fixtures. `@executor-js/sdk/http-auth` carries the shared auth-method
vocabulary the HTTP protocol plugins compose (core never imports it, keeping it
carrier-agnostic).
- `packages/core/storage-*`: storage adapters and test support.
- `packages/plugins/*`: protocol and provider plugins; their runtime, React, API,
and testing helpers live with the owning plugin.
- `packages/react`: shared React UI and atom/client integration.
- `packages/hosts/mcp`: MCP host surface for exposing Executor through MCP.
- `packages/hosts/mcp`: MCP host surface.
- `packages/kernel/*`: execution runtimes and code execution substrate.
- `apps/local`, `apps/cloud`, `apps/cli`, and `apps/desktop`: product entry
points that compose the packages.
- `apps/{local,cloud,cli,desktop}`: product entry points composing the packages.

## Other

Please make note of mistakes you make in MISTAKES.md. If you find you wish you
had more context or tools, write that down in DESIRES.md. If you learn anything
about your env write that down in LEARNINGS.md.
Note mistakes in MISTAKES.md, missing context or tools in DESIRES.md, and env
learnings in LEARNINGS.md.
Loading