Skip to content

LennartPaduch/YT-Popular-Videos-Retrieval-Script

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YouTube Popular Videos Retrieval Script

This Python script retrieves and stores information about up to the top 200 trending videos, based on specified category_ids, for each specified country. The script saves this information into a PostgreSQL database table called yt_videos. If the script retrieves a video_id that already exists in the yt_videos table, it updates the values in the table, whereas all videos are always inserted into the yt_videos_history table, which provides a historical picture and enables data analysis, such as tracking changes in metrics over time. Additionally, the script calculates a SHA-256 hash for each video thumbnail and detects changes in thumbnails by comparing the hashes with those stored in the database. Finally, the script automatically downloads new or updated thumbnails.

Features

  • Utilizes the YouTube API to retrieve video information
  • Stores video information in a postgreSQL database
  • Calculates a sha-256 hash for video thumbnails
  • Detects changes in thumbnails by comparing hashes with stored ones in the database
  • Automatically downloads new or updated thumbnails

Requirements

  • Python
  • postgreSQL database (Railway for a quick & easy setup)
  • A valid YouTube API key

Getting Started

  1. Clone the repository
  2. Set up the postgreSQL database using the SQL table creation commands provided in the SQL tables folder. You may adjust these tables to your specific use case, in which case you would also need to modify some parts of the main.py script accordingly. This can include changes to the table structure, query conditions, or any other relevant parts.
  3. Obtain a valid YouTube API key
  4. Replace the placeholder values in main.py and database.ini with your own credentials
  5. Activate the virtual environment .venv or create your own and install the dependencies listed in requirements.txt
  6. Run the script

Virtual Environment

To activate an existing virtual environment, run the activate file in the Scripts folder of the virtual environment directory. You can activate the virtual environment .venv from this repository by typing .venv/Scripts/activate in the terminal or command line while in the directory where the .venv folder is located.

To create a new virtual environment, run the following command in your terminal or command line:
python -m venv <directory>

For example, to create a virtual environment named venv, which is a commonly used option, run the following command:
python -m venv venv

Running the Script

Note: By default, the categories for primary and secondary starting arguments are set to 'All', 'Gaming', 'Comedy' and 'Entertainment'. However, you can adjust these categories in the main function of the main.py file.
If you run the script with any value other than 'PRIMARY' or 'SECONDARY' for the --countries argument, it will retrieve data only from the 'All' category, which includes every category but is limited to a maximum of 200 videos. Note that the value(s) provided for --countries are case-insensitive.

You have several options for the --countries argument when running the script:
python src/main.py --countries primary Retrieves information about the most popular videos per category from each of the 22 primary countries.
python src/main.py --countries secondary Retrieves information about the most popular videos per category from each of the 87 less-viewed countries.
python src/main.py --countries all Retrieves information about up to the top 200 trending videos from all 109 countries.
Alternatively, you can specify any number of arbitrary countries by using python src/main.py --countries "DE, RU, FR, US". You need to separate each country by commas and enclose the list of countries in quotation marks if you specify more than one country. For example, python src/main.py --countries "DE,FR" is valid, whereas python src/main.py --countries DE,FR is not.

Note:
To ensure efficient usage of the YouTube API key and avoid exceeding its daily quota limit of 10,000, the supported countries by YouTube have been separated into two subsets - primary and secondary. I configured the script to run hourly for the primary regions and every six hours for the secondary regions on an AWS EC2 instance running Debian using Linux' cron tasks.
You have the flexibility to customize the priority of countries or remove the separation altogether, depending on your specific setup, use case, and API key's quota limits.

Documentation

For further details on the implementation and usage of the script, please refer to the inline documentation within the code.

About

Fetches the most trending YouTube videos from YouTube's API and saves them to a postgreSQL database. Downloads new thumbnails by comparing sha-256 hash for every thumbnail.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages