A Python tool for scraping rental listing data from ByOwner.com using the ScrapingAnt web scraping API. Built for educational purposes to demonstrate how to work with AJAX-based real estate search APIs, proxy-based web scraping, and structured data extraction.
Disclaimer: This project is intended for educational and research purposes only. Please review ByOwner.com's Terms of Service before use. The authors are not responsible for any misuse of this tool.
- Sends POST requests to ByOwner's AJAX search endpoint with geographic bounds and filters
- Routes requests through ScrapingAnt proxy (
browser=false,proxy_country=US) with custom headers (X-Requested-With: XMLHttpRequest) to trigger JSON responses - Parses the JSON response from the
get_propertiesarray — 20 listings per page - Handles page-based pagination via URL path segments (
/page-2,/page-3, etc.) - Deduplicates listings by
property_idand exports to CSV/JSON
pip install -r requirements.txtSign up for a ScrapingAnt API key to get started.
python main.py \
--state new-york \
--city newburgh \
--bounds 40.919458373577875 42.09514162642213 -74.61040226374416 -73.44259773625583 \
--property-types homes condo townhouse \
--max-pages 3 \
--api-key "YOUR_SCRAPINGANT_API_KEY" \
--verbose| Argument | Description |
|---|---|
--state |
State URL slug (e.g. new-york, california) |
--city |
City URL slug (e.g. newburgh, los-angeles) |
--bounds MIN_LAT MAX_LAT MIN_LON MAX_LON |
Geographic bounding box (4 floats) |
| Argument | Default | Description |
|---|---|---|
--property-types |
all | Property types (e.g. homes condo townhouse) |
--page |
1 |
Starting page number |
--max-pages |
0 (all) |
Max pages to fetch |
--min-price / --max-price |
- | Price range filter |
--min-bedrooms / --max-bedrooms |
- | Bedroom count filter |
--min-bathrooms / --max-bathrooms |
- | Bathroom count filter |
--min-sqft / --max-sqft |
- | Square footage filter |
--output / -o |
output/byowner_rentals.csv |
Output CSV path |
--json |
false |
Also export as JSON |
--api-key |
env SCRAPINGANT_API_KEY |
ScrapingAnt API key |
--verbose / -v |
false |
Enable verbose logging |
| Field | Description |
|---|---|
property_id |
ByOwner property identifier |
mls_id |
MLS listing number |
property_type |
Residential Lease, etc. |
secondary_type |
House, Apartment, Condo, Townhouse, etc. |
address |
Full formatted address |
city, state, zip_code, county |
Location components |
price |
Monthly rent |
bedrooms, bathrooms |
Room counts |
sqft |
Living square footage |
lot_size |
Lot size (acres) |
year_built |
Year of construction |
lat, lon |
Geographic coordinates |
url |
Full ByOwner listing URL |
image_url |
Primary listing photo URL |
description |
Public remarks (truncated to 500 chars) |
status |
Listing status (Active, Coming Soon, etc.) |
listing_office |
Listing brokerage |
mls_source |
MLS data source (e.g. OneKey) |
last_updated |
MLS update timestamp |
date_scraped |
Timestamp of scrape |
Scrape all rental listings in the Newburgh, NY area:
python main.py \
--state new-york \
--city newburgh \
--bounds 40.919458373577875 42.09514162642213 -74.61040226374416 -73.44259773625583 \
--property-types homes condo townhouse \
--json \
--api-key "YOUR_KEY" \
--verboseThis project demonstrates several key concepts for educational purposes:
- AJAX API Discovery: How to identify POST-based AJAX endpoints behind server-rendered pages using browser developer tools
- Custom Header Injection: Using ScrapingAnt's
ant-header prefix to sendX-Requested-With: XMLHttpRequestand trigger JSON responses instead of HTML - POST Request Proxying: Sending form-encoded POST bodies through a proxy API for authenticated AJAX calls
- Pagination via URL Paths: Handling page-based pagination where the page number is embedded in the URL path (
/page-N) rather than query parameters - Data Normalization: Transforming deeply nested MLS data with HTML fragments into clean, flat CSV datasets