Note: This document provides detailed specifications for the JSON file structure used by the Vue.js UI. It defines the data architecture, file organization, and loading strategies.
This document defines the complete JSON file structure for the Gutenberg Vue.js UI, following the pattern established in the Youtube scraper. The structure uses a two-tier approach: high-level preview files for listing/pagination, and detail files for full content.
These files contain preview/summary data and are loaded once when the UI initializes.
ZIM_ROOT/
├── books.json # All books (preview format)
├── authors.json # All authors (preview format)
├── lcc_shelves.json # All LCC shelves (preview format)
└── config.json # UI configuration
These files contain full details and are loaded on-demand when a user views a specific resource.
ZIM_ROOT/
├── books/
│ ├── {id}.json # Individual book details (e.g., 12345.json)
├── authors/
│ ├── {id}.json # Author details + their books (e.g., 68.json)
└── lcc_shelves/
├── {code}.json # LCC shelf details + books (e.g., PR.json)
- High-level:
books.json - Detail files:
books/{id}.jsonwhere{id}is the numeric book ID (e.g.,12345.json) - Example:
books/1.json,books/12345.json
- High-level:
authors.json - Detail files:
authors/{id}.jsonwhere{id}is the author'sgut_id(e.g.,68.json) - Example:
authors/68.json,authors/116.json(for "Various")
- High-level:
lcc_shelves.json - Detail files:
lcc_shelves/{code}.jsonwhere{code}is the LCC shelf code (e.g.,PR.json) - Example:
lcc_shelves/PR.json,lcc_shelves/Q.json - Note: LCC codes are uppercase and may contain multiple characters (e.g.,
PA,PR,PS)
- File:
config.json(always at root level)
- Target size: < 5MB per file (for large ZIMs with 70,000+ books)
- Optimization: Only include essential fields for listing
- Pagination: Frontend handles client-side pagination/filtering
- Typical size: 1-10KB per file
- Largest files: Author detail files with many books (may be 50-100KB for prolific authors)
- Loading strategy: Load on-demand, cache in browser
- Initial load: < 2 seconds for high-level files
- Detail file load: < 100ms per file
- Total ZIM size impact: < 100MB for JSON files (even with 70,000 books)
{
"books": [
{
"id": 12345,
"title": "Pride and Prejudice",
"author": {
"id": "68",
"name": "Austen, Jane",
"bookCount": 7
},
"languages": ["en"],
"popularity": 5,
"coverPath": "A/cover_article_12345.html",
"lccShelf": "PR"
}
],
"totalCount": 70000
}Fields included:
id: Book IDtitle: Book titleauthor: Author preview (id, name, bookCount)languages: List of language codespopularity: Star rating (0-5)coverPath: Path to cover image/articlelccShelf: LCC shelf code (optional)
Fields excluded (to keep file small):
- Full author details
- Formats list
- Downloads count
- Subtitle
- Description
{
"id": 12345,
"title": "Pride and Prejudice",
"subtitle": null,
"author": {
"id": "68",
"firstName": "Jane",
"lastName": "Austen",
"birthYear": "1775",
"deathYear": "1817",
"name": "Austen, Jane"
},
"languages": ["en"],
"license": "Public domain in the USA.",
"downloads": 50000,
"popularity": 5,
"lccShelf": "PR",
"coverPath": "A/cover_article_12345.html",
"formats": [
{
"format": "html",
"path": "A/12345.html",
"available": true
},
{
"format": "epub",
"path": "I/12345.epub",
"available": true
}
],
"description": null
}All fields included for full book details.
{
"authors": [
{
"id": "68",
"name": "Austen, Jane",
"bookCount": 7
}
],
"totalCount": 15000
}Minimal fields for author listing.
{
"id": "68",
"firstName": "Jane",
"lastName": "Austen",
"birthYear": "1775",
"deathYear": "1817",
"name": "Austen, Jane",
"books": [
{
"id": 12345,
"title": "Pride and Prejudice",
"author": {
"id": "68",
"name": "Austen, Jane",
"bookCount": 7
},
"languages": ["en"],
"popularity": 5,
"coverPath": "A/cover_article_12345.html",
"lccShelf": "PR"
}
],
"bookCount": 7
}Includes: Full author details + list of their books (as previews).
{
"shelves": [
{
"code": "PR",
"name": "English literature",
"bookCount": 5000
}
],
"totalCount": 200
}Minimal fields for shelf listing.
{
"code": "PR",
"name": "English literature",
"bookCount": 5000,
"books": [
{
"id": 12345,
"title": "Pride and Prejudice",
"author": {
"id": "68",
"name": "Austen, Jane",
"bookCount": 7
},
"languages": ["en"],
"popularity": 5,
"coverPath": "A/cover_article_12345.html",
"lccShelf": "PR"
}
]
}Includes: Shelf details + list of books in shelf (as previews).
{
"title": "Project Gutenberg Library",
"description": "Free eBooks from Project Gutenberg",
"primaryColor": null,
"secondaryColor": null
}UI configuration for theming and branding.
- Load
config.json- UI configuration - Load
books.json- All book previews (for main listing) - Load
authors.json- All author previews (for author listing) - Load
lcc_shelves.json- All shelf previews (for shelf listing)
- Book detail page: Load
books/{id}.json - Author detail page: Load
authors/{id}.json - LCC shelf page: Load
lcc_shelves/{code}.json
- Cache high-level files in Pinia store (loaded once)
- Cache detail files in Pinia store (loaded on first access)
- Use browser's HTTP cache for subsequent ZIM access
- Missing author: Use "Anonymous" (ID: "216") or "Various" (ID: "116")
- Missing LCC shelf:
lccShelffield isnull - Missing cover:
coverPathisnull - Missing format: Format marked as
available: falsein formats array
- File names: Use numeric IDs for books, alphanumeric IDs for authors, uppercase codes for LCC shelves
- JSON encoding: All files use UTF-8 encoding
- Path separators: Use forward slashes (
/) in JSON paths (ZIM standard)
- 70,000+ books: High-level
books.jsonmay be large; frontend implements virtual scrolling/pagination - Prolific authors: Author detail files may contain 100+ books; frontend paginates the book list
- Popular shelves: LCC shelf detail files may contain 1000+ books; frontend paginates
- Use Pydantic models for type safety and validation
- Generate JSON with
model_dump_json(by_alias=True, indent=2) - Ensure all paths use forward slashes
- Handle missing/optional fields gracefully
- Use Axios for JSON fetching
- Implement error handling for missing files
- Use TypeScript interfaces matching Pydantic schemas
- Cache loaded data in Pinia store
- All JSON files use
mimetype="application/json" - Files are not marked as
is_front=True(exceptconfig.jsonif needed) - Ensure proper UTF-8 encoding
The old format used JavaScript files like:
full_by_popularity.js(containingvar json_data = [...])lang_en_by_popularity.jsauth_68_by_popularity.js
New format advantages:
- Standard JSON (easier to parse)
- On-demand loading (better performance)
- Type-safe (Pydantic validation)
- Folder-based organization (scalable)
The old format has been removed. The scraper now only generates the new JSON files for the Vue.js UI.