Skip to content

kami4ka/ByOwnerScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ByOwnerScraper

A Python tool for scraping rental listing data from ByOwner.com using the ScrapingAnt web scraping API. Built for educational purposes to demonstrate how to work with AJAX-based real estate search APIs, proxy-based web scraping, and structured data extraction.

Disclaimer: This project is intended for educational and research purposes only. Please review ByOwner.com's Terms of Service before use. The authors are not responsible for any misuse of this tool.

How It Works

  1. Sends POST requests to ByOwner's AJAX search endpoint with geographic bounds and filters
  2. Routes requests through ScrapingAnt proxy (browser=false, proxy_country=US) with custom headers (X-Requested-With: XMLHttpRequest) to trigger JSON responses
  3. Parses the JSON response from the get_properties array — 20 listings per page
  4. Handles page-based pagination via URL path segments (/page-2, /page-3, etc.)
  5. Deduplicates listings by property_id and exports to CSV/JSON

Setup

pip install -r requirements.txt

Sign up for a ScrapingAnt API key to get started.

Usage

python main.py \
  --state new-york \
  --city newburgh \
  --bounds 40.919458373577875 42.09514162642213 -74.61040226374416 -73.44259773625583 \
  --property-types homes condo townhouse \
  --max-pages 3 \
  --api-key "YOUR_SCRAPINGANT_API_KEY" \
  --verbose

Required Arguments

Argument Description
--state State URL slug (e.g. new-york, california)
--city City URL slug (e.g. newburgh, los-angeles)
--bounds MIN_LAT MAX_LAT MIN_LON MAX_LON Geographic bounding box (4 floats)

Optional Arguments

Argument Default Description
--property-types all Property types (e.g. homes condo townhouse)
--page 1 Starting page number
--max-pages 0 (all) Max pages to fetch
--min-price / --max-price - Price range filter
--min-bedrooms / --max-bedrooms - Bedroom count filter
--min-bathrooms / --max-bathrooms - Bathroom count filter
--min-sqft / --max-sqft - Square footage filter
--output / -o output/byowner_rentals.csv Output CSV path
--json false Also export as JSON
--api-key env SCRAPINGANT_API_KEY ScrapingAnt API key
--verbose / -v false Enable verbose logging

Output Fields

Field Description
property_id ByOwner property identifier
mls_id MLS listing number
property_type Residential Lease, etc.
secondary_type House, Apartment, Condo, Townhouse, etc.
address Full formatted address
city, state, zip_code, county Location components
price Monthly rent
bedrooms, bathrooms Room counts
sqft Living square footage
lot_size Lot size (acres)
year_built Year of construction
lat, lon Geographic coordinates
url Full ByOwner listing URL
image_url Primary listing photo URL
description Public remarks (truncated to 500 chars)
status Listing status (Active, Coming Soon, etc.)
listing_office Listing brokerage
mls_source MLS data source (e.g. OneKey)
last_updated MLS update timestamp
date_scraped Timestamp of scrape

Example

Scrape all rental listings in the Newburgh, NY area:

python main.py \
  --state new-york \
  --city newburgh \
  --bounds 40.919458373577875 42.09514162642213 -74.61040226374416 -73.44259773625583 \
  --property-types homes condo townhouse \
  --json \
  --api-key "YOUR_KEY" \
  --verbose

Learning Objectives

This project demonstrates several key concepts for educational purposes:

  • AJAX API Discovery: How to identify POST-based AJAX endpoints behind server-rendered pages using browser developer tools
  • Custom Header Injection: Using ScrapingAnt's ant- header prefix to send X-Requested-With: XMLHttpRequest and trigger JSON responses instead of HTML
  • POST Request Proxying: Sending form-encoded POST bodies through a proxy API for authenticated AJAX calls
  • Pagination via URL Paths: Handling page-based pagination where the page number is embedded in the URL path (/page-N) rather than query parameters
  • Data Normalization: Transforming deeply nested MLS data with HTML fragments into clean, flat CSV datasets

About

Python scraper for ByOwner.com rental listings using ScrapingAnt API. Extracts FRBO property details via AJAX POST endpoints with custom header injection. Built for educational purposes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages