🎉 Doctor Reviews Scraper

This project is a web scraper built using Python and Selenium to extract detailed doctor reviews and ratings from a medical website for further research purpose. The scraper collects data such as doctor names, specialties, ratings, and user comments, and saves the results into a CSV file.

📍 Features

Dynamic Web Scraping: Extracts multiple pages of data using Selenium.
Multithreading: Accelerates scraping by processing multiple pages concurrently with ThreadPoolExecutor.
Comprehensive Data: Collects doctor details, reviews, tags, and ratings.
Custom Handling: Handles ads, cookies, and dynamically loaded content.

🔌 Installation

Prerequisites

Install Python (3.7 or higher).
Install Google Chrome and download the appropriate version of ChromeDriver.
Install required Python libraries:
```
pip install selenium tqdm
```

📄 File Structure

project/
├──
├── chromedriver  # ChromeDriver executable
├── scraper.py    # Main Python script
├── doctor.csv    # Output CSV file

Usage

1. Set Up ChromeDriver

Update the PATH in the get_driver() function with the location of your ChromeDriver:

PATH = "/path/to/your/chromedriver"

2. Run the Script

Execute the script in the terminal:

python scraper.py

3. Output

The extracted data will be saved in doctor.csv with the following columns:

d_name: Doctor's name
d_speciality: Doctor's specialty
total_score: Overall rating score
total_survey_count: Total number of surveys
five_star, four_star, three_star, two_star, one_star: Number and percentage for each rating category
positive_tags, negative_tags: Lists of positive and negative tags
comment_text: User comments

Code Structure

Functions

iselement(browser, cssselector) Checks if an element exists on the webpage.
get_driver() Sets up and returns a Selenium WebDriver instance.
get_doc_linklist(url) Scrapes doctor profile links from multiple pages.
get_doctor_details(link) Extracts detailed information from each doctor's profile page.

Multithreading

Uses ThreadPoolExecutor to scrape multiple doctor profiles simultaneously:

with ThreadPoolExecutor(max_workers=10) as executor:
    executor.map(get_doctor_details, link_list)

Handling Common Issues

1. Ad Popups

Automatically closes popups that obstruct scraping.

2. Cookies

Handles cookie popups that block interaction.

3. Pagination

Navigates through multiple pages until the specified limit (99 pages).

Output

Here is the link to the output

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
.DS_Store		.DS_Store
README.md		README.md
chromedriver		chromedriver
doctor.csv		doctor.csv
main.py		main.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎉 Doctor Reviews Scraper

📍 Features

🔌 Installation

Prerequisites

📄 File Structure

Usage

1. Set Up ChromeDriver

2. Run the Script

3. Output

Code Structure

Functions

Multithreading

Handling Common Issues

1. Ad Popups

2. Cookies

3. Pagination

Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎉 Doctor Reviews Scraper

📍 Features

🔌 Installation

Prerequisites

📄 File Structure

Usage

1. Set Up ChromeDriver

2. Run the Script

3. Output

Code Structure

Functions

Multithreading

Handling Common Issues

1. Ad Popups

2. Cookies

3. Pagination

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages