Skip to content

marrazy/activatescraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ActivateScraper

Overview

ActivateScraper is an asynchronous web scraper that fetches and processes event data from ActivateUTS clubs by hitting ActivateUTS hidden backend API to fetch real time data.

Features

  • Asynchronous HTTP requests using httpx
  • Rate limiting and request throttling
  • Data validation using Pydantic models
  • JSON output with proper formatting
  • Error handling and logging

Requirements

  • Python 3.7+
  • httpx
  • asyncio
  • pydantic

Installation

  1. Clone the repository:
git clone https://github.com/tihhh/activatescraper.git
cd activatescraper
  1. Install dependencies:
pip install -r requirements.txt

Usage

Run the scraper:

python main.py

The scraper will:

  1. Load club endpoints from files/club_paths.json (data taken from ActivateUTS sitemap)
  2. Fetch event data for each club asynchronously
  3. Save the results to data/club_data.json

Error Handling

The scraper includes:

  • Rate limit detection and retry mechanism
  • Error logging for failed requests

About

Asynchronous Scraper for ActivateUTS Event Page

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors