Skip to content

Latest commit

 

History

History
204 lines (146 loc) · 5.85 KB

File metadata and controls

204 lines (146 loc) · 5.85 KB

Contributing

Thank you for your interest in contributing to the Gutenberg scraper! This document provides guidelines for contributing to the project.

For general openZIM contribution guidelines, see the openZIM Contributing Wiki.

Project Structure

The project consists of several components:

  • scraper/: Python scraper that downloads books and generates ZIM files
  • ui/: Vue.js frontend that provides the user interface within the ZIM
  • locales/: UI translation files (multiple languages supported)
  • scraper/docs/: Technical documentation
    • JSON_FILE_STRUCTURE.md: JSON schema documentation for the Vue.js UI
    • GUTENBERG_STRUCTURE.md: Project Gutenberg structure and metadata documentation

Ways to Contribute

1. Adding UI Translations

UI translations are managed through TranslateWiki. We welcome volunteers to contribute translations in their native languages.

When a new language <new_code> starts being translated, developers need to add support for it:

  1. Add to ui/src/plugins/i18n.ts:

    • Add the language to the supportedLanguages dictionary
    • Specify its native name and whether it's RTL (right-to-left)
  2. Update locale files:

    • Add languageNames.<new_code> key in locales/en.json
    • Add languageNames.<new_code> key in locales/qqq.json (documentation)
    • Add languageNames.<new_code> key in locales/<new_code>.json

Example: See commit adding Hindi support

2. Contributing Code

Python Scraper

The scraper is located in scraper/src/gutenberg2zim/. Key files:

  • entrypoint.py: CLI argument parsing
  • zim.py: ZIM file creation
  • download.py: Book downloading logic
  • export.py: JSON generation for Vue.js UI
  • rdf.py: RDF metadata parsing

Setup:

cd scraper
pip install hatch
hatch shell

Testing:

hatch run test:run

Linting:

Linux/macOS:

hatch run lint:all

Windows (hatch scripts don't work due to pty limitation):

black src
ruff check src

Type Checking:

hatch run check:all

Documentation:

Before contributing, familiarize yourself with these key documents:

  • JSON File Structure: Detailed specification of the JSON schema used by the Vue.js UI. Essential reading if you're working on data export (export.py) or the Vue.js frontend. Explains the two-tier architecture (preview + detail files), file naming conventions, and loading strategies.

  • Gutenberg Structure: Comprehensive overview of the project architecture, including directory structure, Python-to-Vue.js data flow, Pydantic schemas, and design decisions. Useful for understanding how the scraper and UI work together.

Vue.js UI

The UI is located in ui/src/. Key directories:

  • views/: Page components (Home, Book Detail, Author List, etc.)
  • components/: Reusable components
  • stores/: Pinia state management
  • router/: Vue Router configuration
  • plugins/: i18n and Vuetify setup

Setup:

cd ui
npm install

Development:

npm run dev

Build:

npm run build

Linting:

npm run lint

3. Developing the Vue.js UI with Real Data

When developing the UI, you need JSON assets (books.json, authors.json, etc.) generated by the scraper. Here's the recommended workflow:

1. Build the Docker image:

docker build -t local-gutenberg .

2. Generate a small ZIM file with JSON assets:

docker run --rm -it -v "$PWD/output":/output \
  local-gutenberg \
  gutenberg2zim --books 1,2,3 --languages en --formats html \
  --zim-file gutenberg_dev --output /output

Adjust --books, --languages, and --formats to match your test dataset.

3. Extract the assets:

# Clean previous assets
find ui/public/ -mindepth 1 ! -name ".gitignore" -delete

# Extract from ZIM
docker run -it --rm -v $(pwd)/output:/data ghcr.io/openzim/zim-tools:latest \
  zimdump dump --dir=/data/gutenberg_dev /data/gutenberg_dev.zim

# Move to UI public folder
mv output/gutenberg_dev/* ui/public/
rm -rf output/gutenberg_dev

On Windows, run these commands in WSL or adapt them to PowerShell.

4. Start the UI development server:

cd ui
npm install
npm run dev

The UI will be available at http://localhost:5173 with hot reload.

Important: Clean ui/public/ before building the Docker image again to avoid shipping extracted assets in production.

Code Style

Python

  • Follow PEP 8 style guide
  • Use ruff for linting (configured in pyproject.toml)
  • Use black for formatting
  • Run hatch run lint:all before committing

TypeScript/Vue

  • Follow the project's ESLint configuration
  • Use Prettier for formatting (configured in .prettierrc.json)
  • Run npm run lint before committing
  • Use TypeScript for type safety

Locale Files

  • Use TAB indentation (not spaces)
  • Use CRLF line endings (Windows style)
  • Keep keys sorted alphabetically

Pull Request Guidelines

  1. Create a feature branch from main
  2. Write clear commit messages describing what and why
  3. Test your changes thoroughly
  4. Update documentation if needed (README, CONTRIBUTING, etc.)
  5. Ensure all tests pass and linting is clean
  6. Keep PRs focused - one feature or fix per PR
  7. Rebase and squash commits before submitting to keep history clean

Getting Help

  • Issues: Check existing issues or create a new one
  • Discussions: Use GitHub Discussions for questions
  • Code Review: Maintainers will review PRs and provide feedback

License

By contributing, you agree that your contributions will be licensed under the GPLv3 license.