MetaDB is a Dockerized application designed to automatically extract, parse, and manage IMDb dataset information using Python and SQLite. This project aims to provide a network-accessible database container that allows interaction with the dataset through various services.
- Automated Data Extraction: Automatically extracts the IMDb dataset from the official website based on a user-defined schedule.
- Data Parsing: Combines relevant fields (title, IMDb ID, year, director, writers, producer, etc.) into a single line for easier access and management.
- SQLite Database Management: Builds and manages an SQLite database directly from Python, with capabilities to insert parsed data and clean out old metadata.
- User Configurable: Utilizes a YAML configuration file for dataset links, scheduling options, and other necessary settings.
- Logging and Error Handling: Implements robust logging to track errors and summaries, maintaining a maximum of 5 log files with automatic renaming for older logs.
- Memory Management: Minimizes memory cache usage and clears cache upon completion of each run.
MetaDB
├── src
│ ├── main.py # Entry point of the application
│ ├── imdb_extractor.py # IMDb data extraction logic
│ ├── parser.py # Data parsing functionality
│ ├── db_builder.py # SQLite database management
│ ├── config.yaml # User-configurable settings
│ └── utils
│ └── logger.py # Logging utility
├── requirements.txt # Python dependencies
├── Dockerfile # Docker image build instructions
├── docker-compose.yml # Service definitions and configurations
├── README.md # Project documentation
└── .gitignore # Git ignore rules
-
Clone the Repository:
git clone https://github.com/yourusername/MetaDB.git cd MetaDB -
Install Dependencies: Ensure you have Python and Docker installed. Then, install the required Python packages:
pip install -r requirements.txt -
Configure the Application: Edit the
src/config.yamlfile to set your dataset links and scheduling options. -
Build the Docker Image:
docker build -t metadb . -
Run the Application: Use Docker Compose to start the application:
docker-compose up
Once the application is running, you can interact with the SQLite database from other containers using the Docker network. The database will be stored on an external volume, ensuring data persistence.
Logs will be generated in the external volume specified in the Docker configuration. Each run will create a new log file, and older logs will be renamed with a suffix -1.log, maintaining a maximum of 5 logs.
Contributions are welcome! Please submit a pull request or open an issue for any enhancements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for details.