Skip to content

vinimabreu/web-pilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

web-pilot

A browser-use agent built on the rule that makes browser agents safe to run: the model proposes one action, the code disposes. Claude decides what to do on a page; code validates every action against guardrails before it touches the browser, and logs every step.

A model driving a browser is exactly where you do not want the model in charge. So here it never is. The model gets a short, closed list of actions and proposes one per step; the agent parses it, checks it against an allowlist and a sensitive-field rule, and only then executes it. A blocked action is fed back as a reason and the model tries again, within a step budget. Every step is recorded, because a browser agent nobody can audit is a liability.

architecture

See it drive (no API key)

pip install playwright && playwright install chromium
python main.py demo

This serves a bundled sandbox store and drives it with a key-free heuristic brain. A real captured run, against a real Chromium browser:

goal: Find the price of the Standing Desk.
outcome: finished
result: (demo brain) found $389.00

  step 1 @ http://127.0.0.1:.../index.html
    proposed: click
    ok (clicked e1 (Products))
  step 2 @ http://127.0.0.1:.../products.html
    proposed: click
    ok (clicked e2 (Standing Desk))
  step 3 @ http://127.0.0.1:.../desk.html
    proposed: finish
    ok (finished)

The agent started on the store home, walked to Products, opened the Standing Desk page, read the price off it, and stopped. The demo brain is a labelled heuristic so the whole loop runs for free; with python main.py run the same loop is driven by Claude.

The guardrails are the point

Give the agent a goal that needs a password or a trip off-site, and it refuses. This is a real captured run on the sandbox (reproduce it with python examples/safety_demo.py), where a scripted brain tries exactly those two things:

goal: Log into the account
outcome: gave_up

  step 1 @ .../login.html
    proposed: type
    BLOCKED (typing into a credential or payment field ('Password') is not allowed)
  step 2 @ .../login.html
    proposed: navigate
    BLOCKED (navigation to 'https://example.com/steal' is outside the allowed sites; stay on the allowed hosts)
  step 3 @ .../login.html
    proposed: give_up
    ok (gave up)

Three rules, enforced in code, that the model cannot talk its way around:

  • Domain allowlist. The agent only acts on hosts you allow. A navigate off the list is refused, and after every action the agent re-checks where the page actually landed, because a click can redirect.
  • No credential or payment fields. The agent never types into a password, card, CVV or similar field. This is decided by the input's type and autocomplete attributes, not by guessing from its label, so a password box named pwd or a card field labelled only by an external <label> is still caught. Reading a page is fine; entering a secret is a line a demo agent does not cross.
  • A step budget. A confused agent stops, instead of clicking forever and billing tokens.

A click that opens a new tab is followed, so the off-site check vets the new tab rather than being fooled by the original page staying put. One honest limit: a server-side redirect can land on an off-allowlist page before the agent can stop, so the agent halts the run the moment it detects it is off the allowed hosts, after the page has already loaded. The allowlist prevents the agent from acting there, not the browser from arriving.

Run it on a real site

export ANTHROPIC_API_KEY=sk-...
python main.py run \
  --goal "Find the price of the cheapest standing desk" \
  --url https://example.com/ --allow example.com

The agent may only visit the starting URL's host plus any --allow host you add. --show opens the browser so you can watch. It prints the same step trace and exits 0 when it finishes the goal.

How it is built

app/actions.py     the closed action vocabulary + a strict parser
app/guardrails.py  the safety gate: allowlist, sensitive fields, off-site
app/browser.py     one Browser interface; FakeBrowser (tests) + Playwright
app/perception.py  a page -> the compact, ref-addressed view the model sees
app/llm.py         the brain: Claude, a key-free heuristic, a scripted double
app/agent.py       the four-beat loop and the Trace it returns
sandbox/           a tiny static store the demo drives, safe and offline

The model only ever refers to elements by a ref (e1, e2, ...) taken from the snapshot it was shown, so it cannot invent a selector that points somewhere it should not. Actions, guardrails and the browser are three separate gates: an action that parses can still be blocked, and one that is allowed can still fail in the browser, each handled on its own.

Tests

python -m pytest

32 tests. Most run with no API key, no browser and no network (the action parser rejecting unknown or malformed actions, each guardrail blocking what it should, the loop reaching a goal, an invalid action fed back instead of executed, the budget stopping a runaway, a brain failure ending cleanly, a redirect off the allowlist halting the run, and the heuristic brain driving the fake store end to end). Six of them are real-DOM contract tests that drive an actual Chromium against real HTML (skipped if Playwright is not installed), because the fake browser used elsewhere can only check what its fixtures put there. Those six pin down what the real snapshot must emit: a password field detected by its type even when its name gives nothing away, a card field labelled only by an external <label>, a payment field marked by autocomplete, and a typed value read back from the live DOM. They exist because an earlier version passed its fake-browser tests while the real credential check could be bypassed.

Scope and safety notes

This is a focused agent for read-and-navigate goals (find a value, follow a path, report it), not a tool for bypassing logins or automating actions behind authentication. The default posture is conservative: an explicit host allowlist, no credential entry, a step budget, and a full audit trail. The bundled demo runs only against its own local sandbox. Point run at sites you own or are authorized to automate, and respect their terms of service.

About

A browser-use agent where the model proposes one action and code disposes: closed action vocabulary, guardrails (domain allowlist, no credential/payment fields, step budget), full audit trace. Runs key-free in a sandbox demo.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors