Contributing

Thank you for your interest in contributing to the Gutenberg scraper! This document provides guidelines for contributing to the project.

For general openZIM contribution guidelines, see the openZIM Contributing Wiki.

Project Structure

The project consists of several components:

scraper/: Python scraper that downloads books and generates ZIM files
ui/: Vue.js frontend that provides the user interface within the ZIM
locales/: UI translation files (multiple languages supported)
scraper/docs/: Technical documentation
- JSON_FILE_STRUCTURE.md: JSON schema documentation for the Vue.js UI
- GUTENBERG_STRUCTURE.md: Project Gutenberg structure and metadata documentation

Ways to Contribute

1. Adding UI Translations

UI translations are managed through TranslateWiki. We welcome volunteers to contribute translations in their native languages.

When a new language <new_code> starts being translated, developers need to add support for it:

Add to ui/src/plugins/i18n.ts:
- Add the language to the supportedLanguages dictionary
- Specify its native name and whether it's RTL (right-to-left)
Update locale files:
- Add languageNames.<new_code> key in locales/en.json
- Add languageNames.<new_code> key in locales/qqq.json (documentation)
- Add languageNames.<new_code> key in locales/<new_code>.json

Example: See commit adding Hindi support

2. Contributing Code

Python Scraper

The scraper is located in scraper/src/gutenberg2zim/. Key files:

entrypoint.py: CLI argument parsing
zim.py: ZIM file creation
download.py: Book downloading logic
export.py: JSON generation for Vue.js UI
rdf.py: RDF metadata parsing

Setup:

cd scraper
pip install hatch
hatch shell

Testing:

hatch run test:run

Linting:

Linux/macOS:

hatch run lint:all

Windows (hatch scripts don't work due to pty limitation):

black src
ruff check src

Type Checking:

hatch run check:all

Documentation:

Before contributing, familiarize yourself with these key documents:

JSON File Structure: Detailed specification of the JSON schema used by the Vue.js UI. Essential reading if you're working on data export (export.py) or the Vue.js frontend. Explains the two-tier architecture (preview + detail files), file naming conventions, and loading strategies.
Gutenberg Structure: Comprehensive overview of the project architecture, including directory structure, Python-to-Vue.js data flow, Pydantic schemas, and design decisions. Useful for understanding how the scraper and UI work together.

Vue.js UI

The UI is located in ui/src/. Key directories:

views/: Page components (Home, Book Detail, Author List, etc.)
components/: Reusable components
stores/: Pinia state management
router/: Vue Router configuration
plugins/: i18n and Vuetify setup

Setup:

cd ui
npm install

Development:

npm run dev

Build:

npm run build

Linting:

npm run lint

3. Developing the Vue.js UI with Real Data

When developing the UI, you need JSON assets (books.json, authors.json, etc.) generated by the scraper. Here's the recommended workflow:

1. Build the Docker image:

docker build -t local-gutenberg .

2. Generate a small ZIM file with JSON assets:

docker run --rm -it -v "$PWD/output":/output \
  local-gutenberg \
  gutenberg2zim --books 1,2,3 --languages en --formats html \
  --zim-file gutenberg_dev --output /output

Adjust --books, --languages, and --formats to match your test dataset.

3. Extract the assets:

# Clean previous assets
find ui/public/ -mindepth 1 ! -name ".gitignore" -delete

# Extract from ZIM
docker run -it --rm -v $(pwd)/output:/data ghcr.io/openzim/zim-tools:latest \
  zimdump dump --dir=/data/gutenberg_dev /data/gutenberg_dev.zim

# Move to UI public folder
mv output/gutenberg_dev/* ui/public/
rm -rf output/gutenberg_dev

On Windows, run these commands in WSL or adapt them to PowerShell.

4. Start the UI development server:

cd ui
npm install
npm run dev

The UI will be available at http://localhost:5173 with hot reload.

Important: Clean ui/public/ before building the Docker image again to avoid shipping extracted assets in production.

Code Style

Python

Follow PEP 8 style guide
Use ruff for linting (configured in pyproject.toml)
Use black for formatting
Run hatch run lint:all before committing

TypeScript/Vue

Follow the project's ESLint configuration
Use Prettier for formatting (configured in .prettierrc.json)
Run npm run lint before committing
Use TypeScript for type safety

Locale Files

Use TAB indentation (not spaces)
Use CRLF line endings (Windows style)
Keep keys sorted alphabetically

Pull Request Guidelines

Create a feature branch from main
Write clear commit messages describing what and why
Test your changes thoroughly
Update documentation if needed (README, CONTRIBUTING, etc.)
Ensure all tests pass and linting is clean
Keep PRs focused - one feature or fix per PR
Rebase and squash commits before submitting to keep history clean

Getting Help

Issues: Check existing issues or create a new one
Discussions: Use GitHub Discussions for questions
Code Review: Maintainers will review PRs and provide feedback

License

By contributing, you agree that your contributions will be licensed under the GPLv3 license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Contributing

Project Structure

Ways to Contribute

1. Adding UI Translations

2. Contributing Code

Python Scraper

Vue.js UI

3. Developing the Vue.js UI with Real Data

Code Style

Python

TypeScript/Vue

Locale Files

Pull Request Guidelines

Getting Help

License

Uh oh!

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing

Project Structure

Ways to Contribute

1. Adding UI Translations

2. Contributing Code

Python Scraper

Vue.js UI

3. Developing the Vue.js UI with Real Data

Code Style

Python

TypeScript/Vue

Locale Files

Pull Request Guidelines

Getting Help

License