An agentic Playwright test generation pipeline using the Claude Agent SDK. Automatically generates Page Objects, writes end-to-end tests, executes them, and heals failures using AI agents.
This pipeline takes a test plan (YAML+Gherkin markdown) and a profile (project configuration + knowledge files) and orchestrates Claude agents to:
- Analyze the test plan and project structure
- Build Page Objects and Component classes from your application source
- Write Playwright test specs based on Gherkin scenarios
- Validate generated TypeScript code (lint, typecheck, structural checks)
- Execute tests using Playwright
- Heal test failures by analyzing errors and applying minimal fixes
- Python 3.11+
- Node.js 18+ (for Playwright)
- An existing Playwright TypeScript project
- Claude API key (
ANTHROPIC_API_KEYenvironment variable)
# Clone and install
git clone <repo-url>
cd test-gen-pipeline
pip install -e .# Run the full pipeline
pipeline run --profile openproject --plan tests/fixtures/sample_test_plan.md
# Or run individual steps
pipeline build-pom --profile openproject --plan tests/fixtures/sample_test_plan.md
pipeline write-test --profile openproject --plan tests/fixtures/sample_test_plan.md
pipeline validate --profile openproject --files "src/po/openproject/**/*.ts"
pipeline heal --profile openproject --run-id 20260513-120000-abcd1234Enable debug logging with -v:
pipeline -v run --profile openproject --plan tests/fixtures/sample_test_plan.mdTest plans are markdown files with YAML frontmatter followed by Gherkin scenarios.
---
id: TEST-001
suite: Dashboard
feature: User Dashboard
component: dashboard
priority: P1
tags: [ui, regression]
variables:
dashboardTitle: "My Dashboard"
setup:
- login: { username: "admin", password: "admin" }
---
Background:
Given the user is authenticated
And the user is on the dashboard page
Scenario: View dashboard widgets
When the user opens the dashboard
Then they see the main widgets
And the dashboard title displays "My Dashboard"
Scenario: Add a widget
When the user clicks "Add Widget"
And the user selects "Calendar" from the widget menu
Then the calendar widget appears on the dashboardid- Unique test identifiersuite- Test suite namefeature- Feature descriptioncomponent- Maps to a directory inpo_base_dirpriority- P1, P2, or P3tags- Searchable test tagsvariables- Test data (supports<uuid4>for unique values)setup- API preconditions (optional)Background- Gherkin background stepsScenario- Test scenarios in Gherkin format
Profiles define how the pipeline should work with your project. Each profile contains:
- Project paths (root, PO directory, test directory)
- Playwright base URL and authentication
- Validation commands and structural checks
- Knowledge files (embedded in agent prompts)
- Create
profiles/<name>/profile.yaml:
name: myproject
stack: react
project_root: "/path/to/playwright-project"
po_base_dir: src/po/myproject
test_dir: tests/ui
barrel_export: internals.ts
fixture_path: tests/ui/fixtures.ts
base_url: "http://localhost:3000"
auth:
strategy: session_cookie
username: testuser
password: testpass123
validation:
lint_command: "eslint {files} --max-warnings=0"
typecheck_command: "tsc --noEmit"
structural_checks:
- name: "extends BasePage"
pattern: "extends BasePage<"
file_glob: "src/po/**/*.ts"
must_match: true
error_message: "All page objects must extend BasePage<T>"- Create knowledge files in
profiles/<name>/knowledge/:
architecture.md- Project architecture and patternspage-objects.md- Page Object class templatescomponents.md- Component class patternslocator-patterns.md- How to find elements in your appsource-code-map.md- Where to find element IDs and source codedom-quirks.md- Runtime DOM behavior specificstest-structure.md- Test file conventions (overrides base)
Knowledge files are injected into agent system prompts to guide code generation.
test-gen-pipeline/
βββ src/pipeline/ # Main Python package
β βββ cli.py # Click CLI entry point
β βββ orchestrator.py # State machine orchestrator
β βββ profile.py # Profile loading & merging
β βββ prompt_builder.py # Jinja2 prompt rendering
β βββ test_plan_parser.py # YAML+Gherkin parser
β βββ pom_contract_extractor.py # TypeScript signature extraction
β βββ validator.py # Lint, typecheck, structural checks
β βββ classifier.py # Failure categorization
β βββ state.py # JSON state persistence
β βββ agents/
β β βββ base.py # AgentRunner wrapper
β β βββ pom_builder.py # Page Object builder agent
β β βββ test_writer.py # Test writer agent
β β βββ healer.py # Failure healer agent
β βββ schemas/
β βββ profile_schema.py # Configuration schemas
β βββ run_state.py # Pipeline state schemas
β βββ test_plan.py # Test plan schemas
β βββ pom_contract.py # POM contract schemas
βββ prompt-templates/ # Jinja2 agent system prompts
β βββ pom_builder.md.j2
β βββ test_writer.md.j2
β βββ healer.md.j2
βββ profiles/ # Profile configurations
β βββ _base/knowledge/ # Shared knowledge files
β βββ openproject/ # OpenProject profile example
βββ artifacts/ # Run outputs (state files)
βββ tests/fixtures/ # Sample test plans
βββ pyproject.toml # Package configuration
The pipeline progresses through these states:
ANALYZE_PLAN β CHECK_POM β [BUILD_POM β VALIDATE_POM β] WRITE_TEST
β VALIDATE_TEST β RUN_TEST β DONE
β (if tests fail)
CLASSIFY_FAILURE
β β
HEAL FAILED
β
RUN_TEST (retry)
- ANALYZE_PLAN - Validates test plan format and content
- CHECK_POM - Checks if Page Object directories exist
- BUILD_POM - AI agent generates Page Objects and Components
- VALIDATE_POM - Lint, typecheck, structural validation
- WRITE_TEST - AI agent writes test specs
- VALIDATE_TEST - Validate generated tests
- RUN_TEST - Execute tests with Playwright
- CLASSIFY_FAILURE - Categorize errors and decide next action
- HEAL - AI agent applies minimal fixes to failing tests
- DONE/FAILED - Terminal states
Three Claude agents drive the pipeline:
- Model: claude-sonnet-4-20250514
- Max turns: 30
- Responsibility: Generate Page Object and Component TypeScript classes
- Tools: Read, Write, Edit, Glob, Grep, Bash
- Knowledge: Architecture, page-objects, components, locators, source-code-map, browser-tools
- Model: claude-sonnet-4-20250514
- Max turns: 20
- Responsibility: Write Playwright test specs from Gherkin scenarios
- Tools: Read, Write, Edit, Glob, Grep, Bash
- Knowledge: Test structure, fixtures, POM contracts
- Model: claude-sonnet-4-20250514
- Max turns: 8
- Responsibility: Fix failing tests with minimal patches
- Tools: Read, Edit, Bash, Glob, Grep
- Knowledge: Locators, DOM quirks, browser-tools, source-code-map
The pipeline validates generated code at multiple stages:
Runs your project's linter (e.g., eslint) on generated files only.
Runs TypeScript compiler (tsc) project-wide to catch type errors.
Regex-based validation to ensure generated code follows patterns:
- All Page Objects extend
BasePage<T> - All Components extend
BaseComponent - Proper import statements
- Fixture usage patterns
Define structural checks in your profile's validation.structural_checks.
When tests fail, the classifier categorizes errors:
| Category | Examples | Action |
|---|---|---|
timeout |
Waiting for element timeout | Heal (adjust waits/locators) |
locator_not_found |
Element not found, strict mode | Heal (fix selectors) |
assertion_mismatch |
Assertion failed | Heal (update expectations) |
missing_method |
Method doesn't exist | Rebuild POM (regenerate classes) |
import_error |
Module/import issues | Rebuild POM |
infrastructure |
Network/connection errors | Abort (cannot fix) |
unknown |
Unrecognized error | Abort |
Repeated identical failures are detected and cause immediate abort.
# Full pipeline
pipeline run --profile <name> --plan <path> [--ci] [--resume <run-id>]
# Individual steps
pipeline build-pom --profile <name> --plan <path> [--ci]
pipeline write-test --profile <name> --plan <path> [--ci]
pipeline validate --profile <name> --files "<glob>"
pipeline heal --profile <name> --run-id <id> [--ci]-v, --verbose- Enable DEBUG logging--ci- Bypass interactive prompts (for CI/CD)--profiles-dir- Override profiles directory (default:profiles/)--artifacts-dir- Override artifacts directory (default:artifacts/)--resume- Resume a previous run by ID
0- Success (all tests passed)1- Failure (validation error, max retries exceeded, or unrecoverable error)
ANTHROPIC_API_KEY- Claude API key (required)PROJECT_ROOT- Override project root (optional)
See src/pipeline/schemas/profile_schema.py for complete schema with validation rules.
- Check that generated code is correct and executable
- Review agent prompts and knowledge files for clarity
- Consider breaking complex test plans into smaller scenarios
- Verify PO barrel export path exists
- Ensure imports match your project structure
- Check TypeScript configuration in project
- Increase waits in locator patterns knowledge
- Review DOM quirks for timing issues
- Check if application is running on configured base URL
- Pipeline will detect repeated failures and abort
- Review error output and knowledge files
- May need to manually inspect application with playwright-cli
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/
# Run CLI from source
python -m pipeline.cli run --profile openproject --plan tests/fixtures/sample_test_plan.md- Create
profiles/myprofile/profile.yaml - Add knowledge files in
profiles/myprofile/knowledge/ - Test with:
pipeline run --profile myprofile --plan <test-plan>
Agent system prompts are in prompt-templates/:
pom_builder.md.j2- Page Object generationtest_writer.md.j2- Test writinghealer.md.j2- Failure fixing
Edit Jinja2 templates to adjust agent behavior.
See LICENSE file for details.
For issues and feature requests, see the issue tracker.