Skip to content

danilotrix86/cloudflare-browser-rendering-python

Repository files navigation

Cloudflare Browser Rendering - Python Client

A production-ready Python client for the Cloudflare Browser Rendering REST API.

Extract content from any webpage as Markdown, HTML, screenshots, or structured JSON -- or crawl entire websites with a single call.


Features

  • Single-page endpoints (synchronous, fast 1-5s response):

    • /markdown - Convert any URL to clean Markdown
    • /content - Get fully rendered HTML (after JS execution)
    • /screenshot - Capture screenshots as PNG
    • /json - AI-powered structured data extraction (via Workers AI or BYO model)
  • Multi-page crawling (asynchronous):

    • /crawl - Crawl entire sites with auto-pagination, pattern filters, and progress tracking
  • Built-in resilience: Rate limit handling with Retry-After, automatic retries on transient failures (502/503/504), configurable timeouts, and typed errors


Quick Start

1. Prerequisites

  • A Cloudflare Account (Free or Paid plan)
  • An API Token with Browser Rendering - Edit permissions
  • Your Account ID (visible in the dashboard URL or sidebar)

2. Installation

pip install .

Or for development (includes pytest, ruff, mypy):

pip install -e ".[dev]"

3. Configuration

Create a .env file in the project root:

CLOUDFLARE_API_TOKEN=your_api_token_here
CLOUDFLARE_ACCOUNT_ID=your_account_id_here

4. Usage

from cloudflare_browser import CloudflareBrowser

browser = CloudflareBrowser()

Get Markdown from a URL (~1-3s)

markdown = browser.markdown(url="https://example.com")
print(markdown)

Get rendered HTML (~1-3s)

html = browser.content(url="https://example.com")

Take a screenshot (~2-5s)

browser.screenshot(url="https://example.com", output_path="screenshot.png")

Extract structured data with AI (~3-8s)

data = browser.json_extract(
    url="https://example.com",
    prompt="Extract the main heading and all links",
    response_format={
        "type": "json_schema",
        "schema": {
            "type": "object",
            "properties": {
                "heading": {"type": "string"},
                "links": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "text": {"type": "string"},
                            "url": {"type": "string"},
                        }
                    }
                }
            }
        }
    }
)

Crawl an entire site (~30s+)

pages = browser.crawl_and_collect(
    "https://example.com/docs",
    limit=50,
    formats=["markdown"],
    include_patterns=["**/docs/**"],
    exclude_patterns=["**/changelog/**"],
)

for page in pages:
    print(f"{page['url']}: {page['markdown'][:100]}...")

Async Usage

For async/await support (FastAPI, Sanic, scripts with asyncio.run, etc.), install the async extra:

pip install cloudflare-browser[async]

Then use AsyncCloudflareBrowser -- the API is identical, just await every call:

from cloudflare_browser import AsyncCloudflareBrowser

async with AsyncCloudflareBrowser() as browser:
    # Single-page endpoints
    markdown = await browser.markdown(url="https://example.com")
    html = await browser.content(url="https://example.com")
    image = await browser.screenshot(url="https://example.com", output_path="shot.png")
    data = await browser.json_extract(url="https://example.com", prompt="Extract links")

    # Multi-page crawl
    pages = await browser.crawl_and_collect(
        "https://example.com/docs",
        limit=50,
        formats=["markdown"],
    )

API Reference

CloudflareBrowser class

Constructor

CloudflareBrowser(
    api_token=None,       # Falls back to CLOUDFLARE_API_TOKEN env var
    account_id=None,      # Falls back to CLOUDFLARE_ACCOUNT_ID env var
    timeout=30.0,         # HTTP request timeout in seconds
    max_retries=3,        # Retries on 502/503/504
    load_env=True,        # Load .env file automatically
)

Single-page methods (synchronous)

Method Returns Description
markdown(url=, html=) str Converts a page to Markdown
content(url=, html=) str Returns fully rendered HTML
screenshot(url=, html=, output_path=) bytes Captures a screenshot (saves to disk if output_path set)
json_extract(url=, prompt=, response_format=) dict AI-powered structured data extraction

All single-page methods accept either url or html as input, plus optional goto_options, wait_for_selector, user_agent, authenticate, cookies, and set_extra_http_headers.

Crawl methods (asynchronous)

Method Returns Description
start_crawl(url, **options) str Starts a crawl job, returns job_id
get_results(job_id) dict Fetches results (supports cursor, limit, status)
wait_for_completion(job_id) dict Polls until terminal status
cancel_crawl(job_id) bool Cancels a running job
crawl_and_collect(url, **options) list[dict] Starts, polls, and returns all records

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check .

# Type check
mypy src/

Rate Limits

Free Plan Paid Plan ($5/mo)
Browser time 10 min/day Unlimited
REST API requests 6/min 600/min
Crawl jobs 5/day Unlimited
Pages per crawl 100 100,000

Tip: Use render=False for static sites -- it skips the headless browser entirely and is not billed during the beta.


Documentation

License

MIT License - see the LICENSE file for details.

About

Python client for Cloudflare Browser Rendering API - single-page extraction (markdown, HTML, screenshots, AI-powered JSON) and multi-page async crawling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages