Skip to content

fix: assign distinct exit codes per failure category (resolves F17 collision)#1223

Draft
DaveHanns wants to merge 1 commit into
masterfrom
fix/exit-code-collisions-f17
Draft

fix: assign distinct exit codes per failure category (resolves F17 collision)#1223
DaveHanns wants to merge 1 commit into
masterfrom
fix/exit-code-collisions-f17

Conversation

@DaveHanns

Copy link
Copy Markdown
Contributor

Summary

The CommandExitCodes enum at src/lib/consts.ts:74-91 currently collapses three semantically-distinct failure categories onto exit code 1, two onto 3, and two onto 5 — with an eslint-disable-next-line @typescript-eslint/no-duplicate-enum-values directive admitting the collision was deliberate.

The collapse means callers (CI scripts, agent harnesses, MCP servers, eval pipelines) cannot tell from the exit code alone what category of failure occurred, forcing brittle stderr string-matching to answer basic questions like "did this fail because of auth, or because of code?" and "should I retry, or fail fast?"

This PR assigns distinct exit codes per category, preserving existing values where possible and using BSD sysexits.h conventions for the semantic ones.

New enum

export enum CommandExitCodes {
    BuildFailed = 1,         // unchanged
    BuildTimedOut = 2,        // unchanged
    BuildAborted = 3,         // unchanged
    NoFilesToPush = 4,        // unchanged
    InvalidInput = 5,         // unchanged
    RunFailed = 6,            // was 1; distinct from BuildFailed
    RunAborted = 7,           // was 3; distinct from BuildAborted
    MissingAuth = 77,         // was 1; BSD sysexits.h EX_NOPERM
    InvalidActorJson = 78,    // was 5; BSD sysexits.h EX_CONFIG
    NotFound = 250,           // unchanged
    NotImplemented = 255,     // unchanged
}

The previously-required eslint-disable @typescript-eslint/no-duplicate-enum-values directive at the top of the file is removed (no more duplicates).

Why these specific values

  • Preserved (1-5, 250, 255): any caller that already keyed on a specific code keeps the same behavior. The most common failure mode (BuildFailed = 1) is unchanged.
  • RunFailed = 6, RunAborted = 7: next-free sequential codes. The "Run*" categories logically pair with "Build*" but should be distinct since they describe distinct execution phases.
  • MissingAuth = 77, InvalidActorJson = 78: BSD sysexits.h EX_NOPERM and EX_CONFIG respectively. Self-documenting to anyone familiar with POSIX sysexits.

Backwards-compatibility

  • Callers that only check exitCode !== 0 are unaffected.
  • Callers that key on exitCode === 1 for "BuildFailed" still get the right answer (BuildFailed is still 1).
  • The only callers affected are those that specifically distinguished one of the previously-colliding categories from "code 1 means anything" — and since the codes were indistinguishable before, no such caller can exist.

What other CLIs do

Tool Convention
gh (GitHub CLI) 0 success, 1 generic, 2 cancelled, 4 auth required — the most widely-copied modern pattern
AWS CLI 0, 1, 2 usage error, 130 Ctrl+C, 252 syntax error, 253 env vars, 254 service error
curl 90+ distinct codes (one per error class)
git 0, 1, 128 fatal, 129 usage error
Docker 125 daemon, 126 not executable, 127 not found, 128+N signal-killed
npm documented per-error-class codes (127, 134 SIGABRT, 137 SIGKILL/OOM, ...)

Distinct error categories getting distinct exit codes is the de-facto modern standard. The current apify-cli MissingAuth = BuildFailed = RunFailed = 1 collapse violates the pattern that gh, AWS CLI, curl, git, Docker, and npm all follow.

Sources

Test plan

  • tsc --noEmit passes cleanly
  • No call-site changes needed (all 11 references use enum names, not numeric values)
  • No existing test pins to literal exit code values (the new propagates-non-zero-exit-code.test.ts checks Actor-exit-code passthrough, not the CLI's enum)
  • Reviewer eyeballs the new code values + decides if the 77/78 sysexits choice is acceptable vs. e.g. compact 6/7/8/9 (both options are defensible)

Filed as draft pending review of the code-assignment strategy.

…llision)

The `CommandExitCodes` enum previously collapsed three semantically-distinct
failure categories onto exit code 1 (MissingAuth = BuildFailed = RunFailed),
two onto code 3 (BuildAborted = RunAborted), and two onto code 5 (InvalidInput
= InvalidActorJson) — with an `eslint-disable-next-line no-duplicate-enum-values`
directive admitting the collision was deliberate. The collapse means callers
(CI scripts, agent harnesses, MCP servers, eval pipelines) cannot tell from
the exit code alone what category of failure occurred, forcing brittle stderr
string-matching.

This commit assigns distinct exit codes per category:

- `BuildFailed = 1` — unchanged
- `BuildTimedOut = 2` — unchanged
- `BuildAborted = 3` — unchanged
- `NoFilesToPush = 4` — unchanged
- `InvalidInput = 5` — unchanged
- `RunFailed = 6` — was 1; distinct from BuildFailed
- `RunAborted = 7` — was 3; distinct from BuildAborted
- `MissingAuth = 77` — was 1; BSD `sysexits.h` `EX_NOPERM` per `man 3 sysexits`
- `InvalidActorJson = 78` — was 5; BSD `sysexits.h` `EX_CONFIG`
- `NotFound = 250` — unchanged
- `NotImplemented = 255` — unchanged

Codes for the semantic categories (auth, config) follow the BSD `sysexits.h`
convention so the values are self-documenting to anyone familiar with the
POSIX sysexits taxonomy.

Backwards-compatibility: callers that only check `exitCode !== 0` are
unaffected. The most common existing failure (BuildFailed = 1) keeps its
value. Only callers that specifically distinguished one of the previously-
colliding categories from "code 1 means anything" are affected — and since
they couldn't actually distinguish before, no caller can have depended on
the collision.

The `eslint-disable @typescript-eslint/no-duplicate-enum-values` directive
at the top of `src/lib/consts.ts` is removed (no more duplicates).

This change does not require call-site updates — all 11 existing call sites
use the enum names, not the numeric values. `tsc --noEmit` passes cleanly.

Related: see https://github.com/apify/agentic-actor-dev-eval/blob/main/FINDINGS.md#f17--apify-cli-collapses-missingauth--buildfailed--runfailed-all-into-exit-code-1-preventing-programmatic-distinction-between-failure-categories
for the full survey including standard references (POSIX, sysexits.h) and
how other CLIs (gh, AWS CLI, curl, git, Docker, npm) handle this.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants