Add llm-endpoint parameter to sandbox commands#2811
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds ChangesSandbox endpoint support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new --llm-endpoint option to the ramalama sandbox command, allowing users to specify an external OpenAI-compatible endpoint instead of starting a local model server. However, there are critical issues with the current implementation: prepending http:// to args.llm_endpoint in the OpenCode configuration results in a malformed URL, and several runtime errors occur when --llm-endpoint is provided, including potential dryrun crashes, uninitialized container names, and invalid network namespace arguments being passed to the agent container. Restructuring the initialization and execution logic in run_sandbox is recommended to address these issues.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
a801088 to
8fc29f9
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/ramalama-sandbox-goose.1.md.in`:
- Around line 46-47: The documentation for the @@option llm-endpoint section
needs to be updated to reflect the behavioral changes when this option is used.
Currently, the description still states that two containers are always started
and the model server is always cleaned up, but this is no longer true when
--llm-endpoint is provided. Update the description text under @@option
llm-endpoint to document how providing this option changes the default behavior
regarding container startup and model server cleanup.
In `@docs/ramalama-sandbox-opencode.1.md.in`:
- Around line 44-45: In the DESCRIPTION section following the @@option
llm-endpoint header, add explicit clarification that when the --llm-endpoint
option is provided, the local model server is bypassed and not started or
removed. Instead, the specified external endpoint is used for LLM operations.
This will prevent operators from being misled into thinking the local server
behavior applies regardless of whether this option is set.
In `@ramalama/sandbox.py`:
- Line 198: The baseURL construction on line 198 is prepending "http://" to
args.llm_endpoint, but args.llm_endpoint is already a full URL (set on line
235), resulting in malformed URLs like "http://http://localhost:8080/v1". Fix
this by removing the "http://" prefix from the baseURL field and use
args.llm_endpoint directly, changing "baseURL": f"http://{args.llm_endpoint}/v1"
to "baseURL": f"{args.llm_endpoint}/v1" to avoid the double scheme prefix.
- Around line 228-236: The dryrun early return (checking args.dryrun and
returning after calling agent.engine.dryrun()) happens before the llm_endpoint
initialization code that sets args.port and args.llm_endpoint. Move the
initialization code that calls compute_serving_port(args) and sets
args.llm_endpoint to f"http://localhost:{args.port}" to occur before the dryrun
check, so that when dryrun exits early, it has valid endpoint values instead of
None. This ensures the dryrun command executes with proper default endpoint
configuration.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 7ca7cb18-acd7-4025-8048-e69be0870a30
📒 Files selected for processing (3)
docs/ramalama-sandbox-goose.1.md.indocs/ramalama-sandbox-opencode.1.md.inramalama/sandbox.py
8fc29f9 to
8c7e23f
Compare
8c7e23f to
22bb211
Compare
22bb211 to
6e614b3
Compare
6e614b3 to
feb999a
Compare
feb999a to
607a4e2
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
docs/options/llm-endpoint.md (1)
5-6: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winClarify the
--llm-endpointbehavior when set.The text currently only mentions the localhost default. It should also say that providing a URL skips sandbox model-server startup/cleanup and uses the supplied endpoint directly, so the manpage matches the runtime behavior.
Based on the PR summary: the endpoint now bypasses sandbox startup/cleanup when present.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/options/llm-endpoint.md` around lines 5 - 6, The documentation for the `--llm-endpoint` option currently only describes the default behavior when the endpoint is not provided. Update the description to clarify that when a URL is supplied to `--llm-endpoint`, it bypasses the sandbox model-server startup and cleanup processes and instead uses the supplied endpoint directly. This will ensure the documentation accurately reflects the runtime behavior described in the PR summary.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ramalama/sandbox.py`:
- Line 203: The baseURL construction at line 203 unconditionally appends "/v1"
to args.url, which causes a double "/v1" suffix when the user provides an
endpoint already ending with "/v1" (as can happen in lines 233-237). Modify the
baseURL assignment to check if args.url already ends with "/v1" before appending
it. If it does, use args.url directly; if it does not, append "/v1". Apply the
same logic to the other endpoint construction in lines 233-237 to ensure
consistent handling across both code paths.
- Around line 98-99: The code uses inconsistent checks for the llm_endpoint
argument: line 98 checks `is None` while line 233 uses truthiness (`if not
args.llm_endpoint`). When an empty string is passed via `--llm-endpoint ""`,
these checks behave differently, causing the network container setup to be
skipped when it should be applied. Standardize the endpoint-presence check by
using `is None` in both locations (the conditional at line 98 where add_args is
called for network setup and the conditional around line 233-237 in the
add_network() method). This ensures consistent semantics where only truly
missing endpoints are treated as "not provided".
In `@test/e2e/test_sandbox.py`:
- Line 100: The docstring at the test function describes workdir directory
mounting behavior, but the test actually validates `--port` localhost URL
construction. Update the docstring to accurately reflect what the test is
verifying, removing the reference to workdir mounting and instead describing
that it validates `--port` localhost URL construction.
- Line 91: The docstring at line 91 incorrectly describes the test as validating
workdir mount behavior with the --workdir flag, but the actual test validates
--llm-endpoint URL propagation. Update the docstring to accurately reflect what
the test actually validates, removing the reference to --workdir and
--workdir=/work, and instead describing that it validates the --llm-endpoint URL
is properly propagated.
---
Nitpick comments:
In `@docs/options/llm-endpoint.md`:
- Around line 5-6: The documentation for the `--llm-endpoint` option currently
only describes the default behavior when the endpoint is not provided. Update
the description to clarify that when a URL is supplied to `--llm-endpoint`, it
bypasses the sandbox model-server startup and cleanup processes and instead uses
the supplied endpoint directly. This will ensure the documentation accurately
reflects the runtime behavior described in the PR summary.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e04de863-b8ac-4014-9d2c-23328f82fdbc
📒 Files selected for processing (7)
docs/options/llm-endpoint.mddocs/ramalama-sandbox-goose.1.md.indocs/ramalama-sandbox-opencode.1.md.indocs/ramalama-sandbox.1.mdramalama/sandbox.pytest/e2e/test_sandbox.pytest/unit/test_sandbox_cmd.py
✅ Files skipped from review due to trivial changes (2)
- docs/ramalama-sandbox.1.md
- docs/ramalama-sandbox-opencode.1.md.in
607a4e2 to
7c6a877
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/options/llm-endpoint.md`:
- Around line 1-4: The metadata comment lines at the beginning of the file are
using invalid ATX heading syntax by missing a required space after the hash
symbols. In the four lines starting with the comment (lines 1-4), change each
instance of `####>` to `#### >` by adding a space between the four hash symbols
and the greater-than symbol to comply with markdown syntax rules and resolve the
markdownlint MD018 error.
In `@test/e2e/test_sandbox.py`:
- Line 91: There is a typo in the docstring that documents the test for the llm
endpoint flag. The docstring contains the misspelled flag name `--llm-endpint`
which should be corrected to `--llm-endpoint`. Locate the docstring in the test
file and fix the spelling error so that the flag name matches the actual
command-line argument.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e9c7bedf-3b59-424a-b046-0d255b3f4c67
📒 Files selected for processing (7)
docs/options/llm-endpoint.mddocs/ramalama-sandbox-goose.1.md.indocs/ramalama-sandbox-opencode.1.md.indocs/ramalama-sandbox.1.mdramalama/sandbox.pytest/e2e/test_sandbox.pytest/unit/test_sandbox_cmd.py
✅ Files skipped from review due to trivial changes (3)
- docs/ramalama-sandbox-opencode.1.md.in
- docs/ramalama-sandbox-goose.1.md.in
- docs/ramalama-sandbox.1.md
🚧 Files skipped from review as they are similar to previous changes (2)
- test/unit/test_sandbox_cmd.py
- ramalama/sandbox.py
7c6a877 to
62d2022
Compare
| ####> ramalama sandbox goose, ramalama sandbox opencode | ||
| ####> If this file is edited, make sure the changes | ||
| ####> are applicable to all of those. | ||
| #### **--llm-endpoint**= |
There was a problem hiding this comment.
Rather then this, why not just use the second param.
ramalama sandbox opencode # Use localhost with computed or given port.
ramalama sandbox opencode MODEL # Current behavior
ramalama sandbox opencode URL # connect to the openAI compatable endpoint currently running at the specified url.
There was a problem hiding this comment.
I see.
Currently the model parameter is required as of now in the main branch like this
ramalama sandbox goose [-h] [-c CTX_SIZE] [--max-tokens MAX_TOKENS] [-p PORT]
[--runtime-args RUNTIME_ARGS] [--host HOST] [--temp TEMP]
[--backend {auto,cuda}] [--cache-reuse CACHE_REUSE] [--ngl NGL]
[--ncmoe NCMOE] [--logfile LOGFILE] [--thinking THINKING]
[--model-draft MODEL_DRAFT] [-t THREADS] [--webui {on,off}]
[--goose-image GOOSE_IMAGE] [-w WORKDIR] [--llm-endpoint LLM_ENDPOINT]
[--authfile AUTHFILE] [--device DEVICE] [--env ENV] [--image IMAGE]
[--keep-groups] [-n NAME] [--network NETWORK] [--oci-runtime OCI_RUNTIME]
[--privileged] [--pull {always,missing,never,newer}] [--seed SEED]
[--selinux SELINUX] [--tls-verify TLSVERIFY]
MODEL [ARGS ...]
extending this to do something like
ramalama sandbox goose [-h] .....
MODEL [LLM_ENDPOINT] [ARGS ...]
There I am not sure how to distinguish the LLM_ENDPOINT from the first of the ARGS !?
OR maybe the first param is either MODEL or the LLM_ENDPOINT and then try to retrieve the model from the endpoint itself. But what to do if I there are more then one model available at the endpoint - then I would need a model parameter to pick the right model from the endpoint ?!
Though I like the idea to simplify the parameters I do not see how to implement this. @rhatdan any idea ?
maybe something like
MODEL[@LLM_ENDPOINT]
where the first parameter is either hf://unsloth/Qwen3.6-27B-GGUF:IQ4_XS or unsloth_Qwen3.6-27B-GGUF_IQ4_XS@http://yoda:8321
If I look at the chat command then I see that the model is a --model option. ALso there is the --list option to see the models from the endpoint, so I can pick the naming of the model.
There was a problem hiding this comment.
The MODEL argument is currently required, and can still be relevant when connecting to a arbitrary endpoint since that endpoint may be serving more than one model. I think the --llm-endpoint option makes more sense.
There was a problem hiding this comment.
The MODEL@LLM_ENDPOINT format is interesting, is there any precedent for that being used elsewhere?
There was a problem hiding this comment.
Well you can distinguish between arguments and options, since options will begin with - or --. I was thinking of the case where model could be inferred as the default on a currently running server.
Not sure how this currently works, but if I run two sandboxes, does it launch two model servers? Is that expected, or would it be better to have both sandboxes work with the same server, if the model is the same?
I don't mind the @ syntax would work. But I would still want to use the default model potentially.
ramalama sandbox goose https://openai ....
ramalama sandbox pi https://hr.redhat.com/ai ...
For example, would I need the user to figure out the model that is running?
There was a problem hiding this comment.
Right now, if you run two sandboxes it runs two different model servers. I think it's necessary because you could have a situation where:
- user pulls model A
- user starts agent session using model A
- user pulls model B
- user starts agent session using model B
There is currently no way to "add" model B to the model server serving the first agent session.
There was a problem hiding this comment.
no - I just come up the the idea <model> <at> <endpoint> could be middlepath. But it kind of looks ugly with the https:// and http:// prefix of the endpoint.
On the otherhand something like MODEL [@ ENDPOINT] [ARGS] would do to have an handle that there is an endpoint in the list.
After this PR maybe the two more parameters are needed. The above mentioned --list for the model list (and their names) and --api-key for the endpoint itself.
There was a problem hiding this comment.
just realized that the API key is needed right away.
@mikebonnet @rhatdan What about having
- --api-key
- --url
the same options the ramalama chat command has ? That would allow to share the options docu and keeps things more uniform.
|
|
||
| args.port = compute_serving_port(args) | ||
|
|
||
| model = New(args.MODEL, args) |
There was a problem hiding this comment.
I think it would make more sense to handle the llm_endpoint case first. If llm_endpoint is specified, the local models are likely irrelevant. Don't call New(args.MODEL, args) at all, just pass args.MODEL to agent_cls(), call agent.run(), and exit. After that, handle the normal case.
|
Could you add a non-dryrun |
62d2022 to
ea822b6
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/test_sandbox.py`:
- Around line 274-317: The e2e currently only proves sandbox can answer, not
that it actually uses the external LLM endpoint. Update the
`RamalamaExecWorkspace` flow around `ramalama sandbox` to make fallback
impossible or detectable, so the test fails if `--llm-endpoint` is ignored and a
local server is started instead. Add an assertion in the `sandbox`/`serve` test
path that specifically verifies the external endpoint is being used, using the
existing `container_name`, `container_port`, and
`requests.get`/`ctx.check_output` flow.
- Around line 103-106: The dry-run endpoint assertion in
test_sandbox_dryrun_llm_endpoint is too permissive because a trailing-slash
fixture lets the OpenCode path produce a malformed base URL while still matching
the substring check. Normalize the --llm-endpoint fixture before asserting, and
make the test validate the exact final URL shape for each agent so run_sandbox()
and OpenCode.add_env_options() cannot hide a double-slash /v1 concatenation bug.
- Around line 1-7: The e2e sandbox test is calling requests.get(...).json()
against /models without waiting for the server to become ready or setting a
timeout, which can make startup flaky or hang. Update the test in
test_sandbox.py to explicitly poll /models until it responds successfully before
decoding JSON, and add a timeout to the request path used by the test so
failures surface quickly. Use the existing test flow around ramalama serve -d
and the /models request to locate the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 12a58de2-e21b-4df1-a5da-5fbabf66ac19
📒 Files selected for processing (40)
docs/options/api-key.mddocs/options/authfile.mddocs/options/backend.mddocs/options/cache-reuse.mddocs/options/ctx-size.mddocs/options/device.mddocs/options/env.mddocs/options/help.mddocs/options/host.mddocs/options/image.mddocs/options/keep-groups.mddocs/options/llm-endpoint.mddocs/options/logfile.mddocs/options/max-tokens.mddocs/options/model-draft.mddocs/options/name.mddocs/options/ncmoe.mddocs/options/network.mddocs/options/ngl.mddocs/options/oci-runtime.mddocs/options/port.mddocs/options/privileged.mddocs/options/pull.mddocs/options/runtime-args.mddocs/options/seed.mddocs/options/selinux.mddocs/options/temp.mddocs/options/thinking.mddocs/options/threads.mddocs/options/tls-verify.mddocs/options/webui.mddocs/options/workdir.mddocs/ramalama-chat.1.mddocs/ramalama-sandbox-goose.1.md.indocs/ramalama-sandbox-opencode.1.md.indocs/ramalama-sandbox-pi.1.md.indocs/ramalama-sandbox.1.mdramalama/sandbox.pytest/e2e/test_sandbox.pytest/unit/test_sandbox_cmd.py
✅ Files skipped from review due to trivial changes (3)
- docs/ramalama-sandbox-pi.1.md.in
- docs/ramalama-sandbox.1.md
- docs/ramalama-sandbox-goose.1.md.in
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/ramalama-sandbox-opencode.1.md.in
- ramalama/sandbox.py
ea822b6 to
d6ebe40
Compare
Signed-off-by: Christian Meier <meier.kristian@gmail.com>
Signed-off-by: Christian Meier <chri5tian@posteo.de>
Signed-off-by: Christian Meier <chri5tian@posteo.de>
d6ebe40 to
1227044
Compare
Signed-off-by: Christian Meier <chri5tian@posteo.de>
2a9f120 to
32605fd
Compare
|
Added also the PI sandbox in here but there were some code on checking the port which I removed as with the llm-endpoint option it made not so much sense anymore - cc @bmahabirbu Since the pi-agent image is not on quay.io: https://quay.io/repository/ramalama/pi-agent?tab=tags |
addresses #2559