A browser-use agent built on the rule that makes browser agents safe to run: the model proposes one action, the code disposes. Claude decides what to do on a page; code validates every action against guardrails before it touches the browser, and logs every step.
A model driving a browser is exactly where you do not want the model in charge. So here it never is. The model gets a short, closed list of actions and proposes one per step; the agent parses it, checks it against an allowlist and a sensitive-field rule, and only then executes it. A blocked action is fed back as a reason and the model tries again, within a step budget. Every step is recorded, because a browser agent nobody can audit is a liability.
pip install playwright && playwright install chromium
python main.py demo
This serves a bundled sandbox store and drives it with a key-free heuristic brain. A real captured run, against a real Chromium browser:
goal: Find the price of the Standing Desk.
outcome: finished
result: (demo brain) found $389.00
step 1 @ http://127.0.0.1:.../index.html
proposed: click
ok (clicked e1 (Products))
step 2 @ http://127.0.0.1:.../products.html
proposed: click
ok (clicked e2 (Standing Desk))
step 3 @ http://127.0.0.1:.../desk.html
proposed: finish
ok (finished)
The agent started on the store home, walked to Products, opened the Standing
Desk page, read the price off it, and stopped. The demo brain is a labelled
heuristic so the whole loop runs for free; with python main.py run the same
loop is driven by Claude.
Give the agent a goal that needs a password or a trip off-site, and it
refuses. This is a real captured run on the sandbox (reproduce it with
python examples/safety_demo.py), where a scripted brain tries exactly
those two things:
goal: Log into the account
outcome: gave_up
step 1 @ .../login.html
proposed: type
BLOCKED (typing into a credential or payment field ('Password') is not allowed)
step 2 @ .../login.html
proposed: navigate
BLOCKED (navigation to 'https://example.com/steal' is outside the allowed sites; stay on the allowed hosts)
step 3 @ .../login.html
proposed: give_up
ok (gave up)
Three rules, enforced in code, that the model cannot talk its way around:
- Domain allowlist. The agent only acts on hosts you allow. A navigate off the list is refused, and after every action the agent re-checks where the page actually landed, because a click can redirect.
- No credential or payment fields. The agent never types into a password,
card, CVV or similar field. This is decided by the input's
typeandautocompleteattributes, not by guessing from its label, so a password box namedpwdor a card field labelled only by an external<label>is still caught. Reading a page is fine; entering a secret is a line a demo agent does not cross. - A step budget. A confused agent stops, instead of clicking forever and billing tokens.
A click that opens a new tab is followed, so the off-site check vets the new tab rather than being fooled by the original page staying put. One honest limit: a server-side redirect can land on an off-allowlist page before the agent can stop, so the agent halts the run the moment it detects it is off the allowed hosts, after the page has already loaded. The allowlist prevents the agent from acting there, not the browser from arriving.
export ANTHROPIC_API_KEY=sk-...
python main.py run \
--goal "Find the price of the cheapest standing desk" \
--url https://example.com/ --allow example.com
The agent may only visit the starting URL's host plus any --allow host you
add. --show opens the browser so you can watch. It prints the same step
trace and exits 0 when it finishes the goal.
app/actions.py the closed action vocabulary + a strict parser
app/guardrails.py the safety gate: allowlist, sensitive fields, off-site
app/browser.py one Browser interface; FakeBrowser (tests) + Playwright
app/perception.py a page -> the compact, ref-addressed view the model sees
app/llm.py the brain: Claude, a key-free heuristic, a scripted double
app/agent.py the four-beat loop and the Trace it returns
sandbox/ a tiny static store the demo drives, safe and offline
The model only ever refers to elements by a ref (e1, e2, ...) taken from the snapshot it was shown, so it cannot invent a selector that points somewhere it should not. Actions, guardrails and the browser are three separate gates: an action that parses can still be blocked, and one that is allowed can still fail in the browser, each handled on its own.
python -m pytest
32 tests. Most run with no API key, no browser and no network (the action
parser rejecting unknown or malformed actions, each guardrail blocking what
it should, the loop reaching a goal, an invalid action fed back instead of
executed, the budget stopping a runaway, a brain failure ending cleanly, a
redirect off the allowlist halting the run, and the heuristic brain driving
the fake store end to end). Six of them are real-DOM contract tests that
drive an actual Chromium against real HTML (skipped if Playwright is not
installed), because the fake browser used elsewhere can only check what its
fixtures put there. Those six pin down what the real snapshot must emit: a
password field detected by its type even when its name gives nothing away, a
card field labelled only by an external <label>, a payment field marked by
autocomplete, and a typed value read back from the live DOM. They exist
because an earlier version passed its fake-browser tests while the real
credential check could be bypassed.
This is a focused agent for read-and-navigate goals (find a value, follow a
path, report it), not a tool for bypassing logins or automating actions
behind authentication. The default posture is conservative: an explicit host
allowlist, no credential entry, a step budget, and a full audit trail. The
bundled demo runs only against its own local sandbox. Point run at sites
you own or are authorized to automate, and respect their terms of service.
