Skip to content

ray-cys/metaDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MetaDB Project

MetaDB is a Dockerized application designed to automatically extract, parse, and manage IMDb dataset information using Python and SQLite. This project aims to provide a network-accessible database container that allows interaction with the dataset through various services.

Features

  • Automated Data Extraction: Automatically extracts the IMDb dataset from the official website based on a user-defined schedule.
  • Data Parsing: Combines relevant fields (title, IMDb ID, year, director, writers, producer, etc.) into a single line for easier access and management.
  • SQLite Database Management: Builds and manages an SQLite database directly from Python, with capabilities to insert parsed data and clean out old metadata.
  • User Configurable: Utilizes a YAML configuration file for dataset links, scheduling options, and other necessary settings.
  • Logging and Error Handling: Implements robust logging to track errors and summaries, maintaining a maximum of 5 log files with automatic renaming for older logs.
  • Memory Management: Minimizes memory cache usage and clears cache upon completion of each run.

Project Structure

MetaDB
├── src
│   ├── main.py               # Entry point of the application
│   ├── imdb_extractor.py     # IMDb data extraction logic
│   ├── parser.py             # Data parsing functionality
│   ├── db_builder.py         # SQLite database management
│   ├── config.yaml           # User-configurable settings
│   └── utils
│       └── logger.py         # Logging utility
├── requirements.txt          # Python dependencies
├── Dockerfile                 # Docker image build instructions
├── docker-compose.yml         # Service definitions and configurations
├── README.md                  # Project documentation
└── .gitignore                 # Git ignore rules

Setup Instructions

  1. Clone the Repository:

    git clone https://github.com/yourusername/MetaDB.git
    cd MetaDB
    
  2. Install Dependencies: Ensure you have Python and Docker installed. Then, install the required Python packages:

    pip install -r requirements.txt
    
  3. Configure the Application: Edit the src/config.yaml file to set your dataset links and scheduling options.

  4. Build the Docker Image:

    docker build -t metadb .
    
  5. Run the Application: Use Docker Compose to start the application:

    docker-compose up
    

Usage

Once the application is running, you can interact with the SQLite database from other containers using the Docker network. The database will be stored on an external volume, ensuring data persistence.

Logging

Logs will be generated in the external volume specified in the Docker configuration. Each run will create a new log file, and older logs will be renamed with a suffix -1.log, maintaining a maximum of 5 logs.

Contributing

Contributions are welcome! Please submit a pull request or open an issue for any enhancements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors