JSON File Structure Specification

Note: This document provides detailed specifications for the JSON file structure used by the Vue.js UI. It defines the data architecture, file organization, and loading strategies.

Overview

This document defines the complete JSON file structure for the Gutenberg Vue.js UI, following the pattern established in the Youtube scraper. The structure uses a two-tier approach: high-level preview files for listing/pagination, and detail files for full content.

File Organization

High-Level Files (Root Level)

These files contain preview/summary data and are loaded once when the UI initializes.

ZIM_ROOT/
├── books.json              # All books (preview format)
├── authors.json            # All authors (preview format)
├── lcc_shelves.json        # All LCC shelves (preview format)
└── config.json             # UI configuration

Detail Files (Folder-Based)

These files contain full details and are loaded on-demand when a user views a specific resource.

ZIM_ROOT/
├── books/
│   ├── {id}.json           # Individual book details (e.g., 12345.json)
├── authors/
│   ├── {id}.json           # Author details + their books (e.g., 68.json)
└── lcc_shelves/
    ├── {code}.json         # LCC shelf details + books (e.g., PR.json)

File Naming Conventions

Books

High-level: books.json
Detail files: books/{id}.json where {id} is the numeric book ID (e.g., 12345.json)
Example: books/1.json, books/12345.json

Authors

High-level: authors.json
Detail files: authors/{id}.json where {id} is the author's gut_id (e.g., 68.json)
Example: authors/68.json, authors/116.json (for "Various")

LCC Shelves

High-level: lcc_shelves.json
Detail files: lcc_shelves/{code}.json where {code} is the LCC shelf code (e.g., PR.json)
Example: lcc_shelves/PR.json, lcc_shelves/Q.json
Note: LCC codes are uppercase and may contain multiple characters (e.g., PA, PR, PS)

Configuration

File: config.json (always at root level)

Data Size Considerations

High-Level Files

Target size: < 5MB per file (for large ZIMs with 70,000+ books)
Optimization: Only include essential fields for listing
Pagination: Frontend handles client-side pagination/filtering

Detail Files

Typical size: 1-10KB per file
Largest files: Author detail files with many books (may be 50-100KB for prolific authors)
Loading strategy: Load on-demand, cache in browser

Performance Targets

Initial load: < 2 seconds for high-level files
Detail file load: < 100ms per file
Total ZIM size impact: < 100MB for JSON files (even with 70,000 books)

File Contents Specification

`books.json`

{
  "books": [
    {
      "id": 12345,
      "title": "Pride and Prejudice",
      "author": {
        "id": "68",
        "name": "Austen, Jane",
        "bookCount": 7
      },
      "languages": ["en"],
      "popularity": 5,
      "coverPath": "A/cover_article_12345.html",
      "lccShelf": "PR"
    }
  ],
  "totalCount": 70000
}

Fields included:

id: Book ID
title: Book title
author: Author preview (id, name, bookCount)
languages: List of language codes
popularity: Star rating (0-5)
coverPath: Path to cover image/article
lccShelf: LCC shelf code (optional)

Fields excluded (to keep file small):

Full author details
Formats list
Downloads count
Subtitle
Description

`books/{id}.json`

{
  "id": 12345,
  "title": "Pride and Prejudice",
  "subtitle": null,
  "author": {
    "id": "68",
    "firstName": "Jane",
    "lastName": "Austen",
    "birthYear": "1775",
    "deathYear": "1817",
    "name": "Austen, Jane"
  },
  "languages": ["en"],
  "license": "Public domain in the USA.",
  "downloads": 50000,
  "popularity": 5,
  "lccShelf": "PR",
  "coverPath": "A/cover_article_12345.html",
  "formats": [
    {
      "format": "html",
      "path": "A/12345.html",
      "available": true
    },
    {
      "format": "epub",
      "path": "I/12345.epub",
      "available": true
    }
  ],
  "description": null
}

All fields included for full book details.

`authors.json`

{
  "authors": [
    {
      "id": "68",
      "name": "Austen, Jane",
      "bookCount": 7
    }
  ],
  "totalCount": 15000
}

Minimal fields for author listing.

`authors/{id}.json`

{
  "id": "68",
  "firstName": "Jane",
  "lastName": "Austen",
  "birthYear": "1775",
  "deathYear": "1817",
  "name": "Austen, Jane",
  "books": [
    {
      "id": 12345,
      "title": "Pride and Prejudice",
      "author": {
        "id": "68",
        "name": "Austen, Jane",
        "bookCount": 7
      },
      "languages": ["en"],
      "popularity": 5,
      "coverPath": "A/cover_article_12345.html",
      "lccShelf": "PR"
    }
  ],
  "bookCount": 7
}

Includes: Full author details + list of their books (as previews).

`lcc_shelves.json`

{
  "shelves": [
    {
      "code": "PR",
      "name": "English literature",
      "bookCount": 5000
    }
  ],
  "totalCount": 200
}

Minimal fields for shelf listing.

`lcc_shelves/{code}.json`

{
  "code": "PR",
  "name": "English literature",
  "bookCount": 5000,
  "books": [
    {
      "id": 12345,
      "title": "Pride and Prejudice",
      "author": {
        "id": "68",
        "name": "Austen, Jane",
        "bookCount": 7
      },
      "languages": ["en"],
      "popularity": 5,
      "coverPath": "A/cover_article_12345.html",
      "lccShelf": "PR"
    }
  ]
}

Includes: Shelf details + list of books in shelf (as previews).

`config.json`

{
  "title": "Project Gutenberg Library",
  "description": "Free eBooks from Project Gutenberg",
  "primaryColor": null,
  "secondaryColor": null
}

UI configuration for theming and branding.

Loading Strategy

Initial Load (UI Startup)

Load config.json - UI configuration
Load books.json - All book previews (for main listing)
Load authors.json - All author previews (for author listing)
Load lcc_shelves.json - All shelf previews (for shelf listing)

On-Demand Load

Book detail page: Load books/{id}.json
Author detail page: Load authors/{id}.json
LCC shelf page: Load lcc_shelves/{code}.json

Caching Strategy

Cache high-level files in Pinia store (loaded once)
Cache detail files in Pinia store (loaded on first access)
Use browser's HTTP cache for subsequent ZIM access

Edge Cases

Missing Data

Missing author: Use "Anonymous" (ID: "216") or "Various" (ID: "116")
Missing LCC shelf: lccShelf field is null
Missing cover: coverPath is null
Missing format: Format marked as available: false in formats array

Special Characters

File names: Use numeric IDs for books, alphanumeric IDs for authors, uppercase codes for LCC shelves
JSON encoding: All files use UTF-8 encoding
Path separators: Use forward slashes (/) in JSON paths (ZIM standard)

Large Collections

70,000+ books: High-level books.json may be large; frontend implements virtual scrolling/pagination
Prolific authors: Author detail files may contain 100+ books; frontend paginates the book list
Popular shelves: LCC shelf detail files may contain 1000+ books; frontend paginates

Implementation Notes

Python Side

Use Pydantic models for type safety and validation
Generate JSON with model_dump_json(by_alias=True, indent=2)
Ensure all paths use forward slashes
Handle missing/optional fields gracefully

Vue.js Side

Use Axios for JSON fetching
Implement error handling for missing files
Use TypeScript interfaces matching Pydantic schemas
Cache loaded data in Pinia store

ZIM Export

All JSON files use mimetype="application/json"
Files are not marked as is_front=True (except config.json if needed)
Ensure proper UTF-8 encoding

Migration from Old Format

The old format used JavaScript files like:

full_by_popularity.js (containing var json_data = [...])
lang_en_by_popularity.js
auth_68_by_popularity.js

New format advantages:

Standard JSON (easier to parse)
On-demand loading (better performance)
Type-safe (Pydantic validation)
Folder-based organization (scalable)

The old format has been removed. The scraper now only generates the new JSON files for the Vue.js UI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JSON File Structure Specification

Overview

File Organization

High-Level Files (Root Level)

Detail Files (Folder-Based)

File Naming Conventions

Books

Authors

LCC Shelves

Configuration

Data Size Considerations

High-Level Files

Detail Files

Performance Targets

File Contents Specification

`books.json`

`books/{id}.json`

`authors.json`

`authors/{id}.json`

`lcc_shelves.json`

`lcc_shelves/{code}.json`

`config.json`

Loading Strategy

Initial Load (UI Startup)

On-Demand Load

Caching Strategy

Edge Cases

Missing Data

Special Characters

Large Collections

Implementation Notes

Python Side

Vue.js Side

ZIM Export

Migration from Old Format

Uh oh!

FilesExpand file tree

JSON_FILE_STRUCTURE.md

Latest commit

History

JSON_FILE_STRUCTURE.md

File metadata and controls

JSON File Structure Specification

Overview

File Organization

High-Level Files (Root Level)

Detail Files (Folder-Based)

File Naming Conventions

Books

Authors

LCC Shelves

Configuration

Data Size Considerations

High-Level Files

Detail Files

Performance Targets

File Contents Specification

books.json

books/{id}.json

authors.json

authors/{id}.json

lcc_shelves.json

lcc_shelves/{code}.json

config.json

Loading Strategy

Initial Load (UI Startup)

On-Demand Load

Caching Strategy

Edge Cases

Missing Data

Special Characters

Large Collections

Implementation Notes

Python Side

Vue.js Side

ZIM Export

Migration from Old Format

`books.json`

`books/{id}.json`

`authors.json`

`authors/{id}.json`

`lcc_shelves.json`

`lcc_shelves/{code}.json`

`config.json`