Structured book catalog → clean CSVs, with pagination and detail enrichment.
A Python scraper for the Books to Scrape demo eCommerce site.
It navigates categories and pages, extracts book details, and saves clean CSVs for analysis or demos.
- Pagination across all catalog pages or a chosen category.
- Structured fields: Title, Price, Availability, Rating, Category, Product URL, Image URL.
- Optional product details: UPC, Tax, Prices incl/excl tax, Stock count, Description.
- Clean outputs saved to
output/books.csv(CSV format by default). - Simple config for selectors and fields.
- Python 3.10+
- Git
# 1) Clone
git clone https://github.com/mdugan8186/book-scraper.git
cd book-scraper
# 2) (optional) Virtual environment
python -m venv .venv
# macOS/Linux:
source .venv/bin/activate
# Windows:
.venv\Scripts\activate
# 3) Dependencies
pip install -r requirements.txtpython run.py- Each run creates a new timestamped CSV file in the
output/folder
(e.g.,books_2025-07-23_20-39-36.csv). Older files are not overwritten.
- Timestamped CSVs saved in
output/(e.g.,books_2025-07-23_20-39-36.csv)
Columns
title, price, availability, rating, category, product_url, image_url, upc, tax, price_excl, price_incl, stock, description
- CSS selectors and parsing logic are defined in the code (or
config.jsonif available). - Update selectors here if site markup changes.
Example of the scraper output:
The full dataset is saved as: output/books_2025-07-23_20-39-36.csv
See TESTING.md for a step-by-step sanity flow, selector maintenance notes, and data-quality checks.
- Requests/Playwright/Selenium (depending on implementation)
- BeautifulSoup / lxml / Selectolax for parsing
- pandas for cleaning (optional)
- CSV outputs
This scraper is intended for educational and demonstration purposes only.
Please review and comply with the target site’s terms of service and robots.txt before using it beyond small-scale testing or portfolio demonstration.
This project is licensed under the MIT License. See LICENSE.
Mike Dugan — Python Web Scraper & Automation Developer
- GitHub: @mdugan8186
- Portfolio Website: scraping-portfolio
- LinkIn: View my profile
- Fiverr: Hire me for web scraping and custom scrapers
- Upwork: Hire me for web scraping and Python automation
- Email: mdugan8186.work@gmail.com
