Skip to content

Add Pagefind backend indexing and search route#40

Open
ritorhymes wants to merge 20 commits into
eips-wg:masterfrom
ritovision:preprocessor/search-pagefind-foundation
Open

Add Pagefind backend indexing and search route#40
ritorhymes wants to merge 20 commits into
eips-wg:masterfrom
ritovision:preprocessor/search-pagefind-foundation

Conversation

@ritorhymes

@ritorhymes ritorhymes commented May 18, 2026

Copy link
Copy Markdown
Contributor

Built on top of the v2 multi-repo local build system #15.

Also built on top of Generate Proposal Metadata JSON #34.

Part of the Pagefind search rollout coordinated in eips-wg/preprocessor#39. This is the first preprocessor PR in the search stack; #41 and #42 build on this foundation, and the theme search stack depends on the generated search route configuration and Pagefind output added here.

Description

Adds build-time Pagefind indexing and the generated /search/ route contract needed by the theme search UI.

This resolves the investigation in #5 into search options beyond Fuse.js by adopting Pagefind and moving search indexing into the build output instead of relying on a browser-side Fuse corpus as the primary search backend.

What Changes

  • Adds the Rust Pagefind crate as a direct dependency and runs Pagefind indexing after Zola renders HTML.
  • Writes Pagefind assets under output/pagefind/ for enabled build output.
  • Adds [search] pagefind = true workspace configuration, with search enabled by default for builds.
  • Adds build --no-search and parity build --no-search to skip Pagefind indexing for a build, overriding the workspace pagefind setting for that run.
  • Limits Pagefind indexing to build commands; serve and check still get a generated search route shell, but they do not run live indexing.
  • Generates content/search.md in the prepared Zola repo with template = "search.html" and the same extra.search state used by build_eips_search.toml.
  • Generates data/build_eips_search.toml with the search enabled state, base path, and Pagefind bundle path for theme consumption.
  • Writes the generated search route only into the prepared repo, not the source worktree.
  • Marks the generated search state disabled for serve, check, and --no-search builds.
  • Resolves search base paths from the active build base URL, including --base-url overrides.
  • Refuses to generate the route if user-authored content already occupies content/search.md, content/search/, content/search/index.md, or content/search/_index.md.
  • Cleans the existing output/pagefind/ directory before re-indexing on enabled builds, and keeps stale Pagefind assets untouched when search is disabled instead of silently deleting prior build output.
  • Adds validation for route collision detection, prepared-repo route placement, targeted --only builds, base-path behavior, disabled route state, stale asset handling, and the Pagefind import boundary.

Design Rationale

Why Pagefind instead of Fuse.js?

Pagefind is built for static sites: it indexes rendered HTML at build time, writes static search assets, and lets the browser load the search runtime and index data when search is used. That is a better fit than making the client parse and search a large generated corpus directly.

Why run Pagefind from build-eips?

The preprocessor can call the Rust Pagefind crate directly, so contributors do not need a separate non-Rust search toolchain such as Node.js or Python, and do not need to install, version, or manage a separate Pagefind binary on PATH.

Why index rendered HTML?

The rendered site is the search source of truth. Pagefind can use the same rendered pages and indexing hooks that the theme exposes, rather than treating feed or metadata output as a search corpus.

Why keep the integration behind a search module boundary?

Pagefind API usage is isolated in src/search/pagefind.rs, while the rest of the preprocessor talks through build-eips-owned search types. That keeps the integration reviewable and easier to maintain or remove without intermingling Pagefind API usage through the preprocessor.

This PR does not add theme templates, browser search runtime behavior, full search filters, search pagination, Created Date filtering, search restore, or the rendered search corpus artifact. Those land in later stacked PRs.

Closes #5

@ritorhymes ritorhymes marked this pull request as ready for review May 20, 2026 23:34
ritorhymes added 20 commits May 25, 2026 20:43
Add the .build-eips.repo.toml schema, loader, validation rules, and
manifest tests for active proposal repositories and declared sibling
repositories.

Introduce ActiveRepoIdentity so later workspace lifecycle and execution
layers can select manifest-backed repository metadata while the legacy
EIPs/ERCs fallback continues to operate.
Add the .build-eips.toml schema, starter config text, upward
discovery, and loaded workspace config accessors.

Define server/site defaults, workspace build-root paths, local theme
and repo paths, and strict parsing for unsupported config fields.

Leave init, doctor, runtime consumption, and render-only filtering to
the later workspace, execution, and targeted rendering PRs.
Add source materialization modes, reshape RepositoryUse around resolved
repository endpoints, and pass source mode explicitly into Fresh.

Add dirty working-tree copying, tracked-path sync, dirty rejection
errors, and sibling merge behavior that follows local file sibling
HEADs.

Route existing build and changed-file flows through clean source
materialization so current behavior stays unchanged.
Add build-eips init to create a workspace root, clone declared sibling
repos and the shared theme, optionally clone template, and create the
local build root.

Write starter .build-eips.toml only when missing and regenerate a base
WORKSPACE.md guide for the initialized workspace.

Use active repository identity and staging repo metadata for workspace
bootstrap while leaving doctor, platform-dev repos, and runtime behavior
to later PRs.
Add build-eips doctor for checking workspace config discovery, active
repo identity, repository layout, sibling manifests, theme checkout, and
required local tools.

Report ok/warn/fail diagnostics and fail the command when any check
records a failure.

Keep this focused on workspace diagnostics; execution policy, runtime
commands, and platform-dev setup land in later PRs.
Add ResolvedExecution for command source policy, build roots, base URL
overrides, staging/production selection, and clean versus dirty source
materialization.

Add CLI execution controls for production, remote siblings, build roots,
build/serve base URL resolution, plain build/serve/check clean mode, and
parity commands, with tests for the command matrix.

Route build, check, serve, clean, and changed-file listing through the
resolved policy while leaving targeted --only behavior to later PRs.
Resolve a workspace-local theme for Zola runtime commands and remove
the remote theme cache path.

Mount the selected theme under repo/themes/eips-theme for Zola, load
Zola config from the mounted theme, and load eipw config from the local
theme checkout.

Require workspace theme setup for build, check, serve, and parity
commands while leaving the prepared runtime pipeline to the next PR.
Move the Prepared runtime pipeline out of main.rs into pipeline.rs and
keep main.rs focused on dispatching resolved runtime operations.

Prepare runtime inputs from ResolvedExecution by cloning and fetching
sources, force-refreshing prepared git scratch refs, merging sibling
proposal content while keeping the active homepage, running eipw lint,
preprocessing markdown, and materializing the local theme for Zola.

Keep the existing minimal Prepared::serve method with the type, while
leaving serve watcher and sync behavior to the serve runtime PR.
Add server binding resolution and serve-only host/port flags for local
Zola serve commands.

Run Zola serve with the resolved server binding, optional base URL
override, fast/force serve flags, and generated output directory.

Add dirty serve watching for dirty active-repo paths and local theme
changes. Clean mode disables active-repo sync but keeps theme sync.
Add build-eips preview for serving the existing resolved output
directory without rebuilding or starting dirty sync.

Reuse server binding resolution and preview-only host/port flags, and
report missing output before binding the local server.

Add a tiny_http static file server with safe path resolution,
index-file fallback, basic content types, and preview path tests.
Add proposal number parsing, editorial selector classification, and
content-path helpers for flat and directory proposal layouts.

Add OnlyRenderPlan to index proposal content for selected rendering,
derive EIP/ERC public URLs, choose internal versus external required
references, filter/prune content, and gate dirty path sync.

Keep this as internal foundation for editorial integration and targeted
build/serve rendering; no user-facing --only config lands here.
Add `build-eips editorial lint` and `build-eips editorial check` as the
first user-facing proposal-selection commands.

Keep eipw options scoped to editorial lint/check. Normal build, check,
and serve prepare runtime sources, preprocess markdown, and run Zola
without carrying eipw source-selection flags.

Run editorial-selected eipw lint against the prepared merged source tree
so cross-repo EIP/ERC references resolve through the same content layout
used by local builds.

Prepare runtime sources from the local active checkout, merge sibling
repositories, and keep active-upstream fetches in changed-file
comparison and editorial `--against-upstream` target selection.
Add build-only render selection from --only and workspace
[render].only, deduping proposal numbers into ResolvedExecution.

Build OnlyRenderPlan during prepared runtime setup, rewrite omitted
proposal links and requires entries to public URLs, and prune unselected
proposal content before Zola runs.

Restrict targeted rendering to local dirty build mode for now, leaving
targeted serve sync to the next PR. Remove the eipw lint step from the
prepared runtime pipeline so linting is reached only through editorial
commands.
Resolve cross-proposal asset links before Zola sees prepared markdown.

Add proposal asset path resolution, rendered URL builders, and an
OnlyRenderPlan asset inventory so links can be validated before targeted
pruning removes omitted proposal content.

Rewrite static asset links to rendered relative URLs when targets are
available locally, and to public EIP/ERC asset URLs when targeted
rendering omits the target proposal. Keep selected asset markdown links
on the existing Zola @/... path, while omitted asset markdown links use
public page URLs.

Preserve query strings and fragments, leave fragment-only and raw HTML
links untouched, and skip already-generated Zola markdown so repeated
preprocessing remains idempotent.
Extend targeted rendering to local dirty serve by accepting --only on
serve and applying workspace [render].only to local serve runs.

Pass OnlyRenderPlan into dirty serve sync and filter active-repo dirty
paths to selected proposal content. Avoid reintroducing omitted proposal
markdown or assets into the materialized repo.

Add incremental targeted markdown preprocessing for dirty serve updates,
including selected asset markdown, retained non-proposal pages, selected
deletions, and filesystem timestamp fallback for new dirty files.
Add --platform-dev workspace initialization for cloning optional
preprocessor and eipw repos alongside proposal repos and theme.

Add POSIX and PowerShell dev-setup scripts that build the local
build-eips binary, ensure a supported Zola is available, and install
0.22.1 when needed while reusing existing 0.22.1-or-newer installs.

Add setup documentation, release archive checksum sidecars, doctor
helper checks, and focused setup tests for contributor workspace setup.
Finalize the preprocessor integration docs and command help.

Describe staging, production, and parity as clean local-active runtime
modes that use remote sibling sources and selected environment metadata.

Keep CLI help, workspace guide, and architecture source-policy wording
aligned around the local active checkout source model.
Parse section index front matter directly as YAML before writing the
generated Zola TOML front matter. This preserves structured section
metadata such as extra.homepage_badges instead of flattening nested YAML
through the proposal preamble parser.

Keep proposal markdown on the existing proposal preamble path, and keep
body link rewriting active for section index pages.
Generate static/assets/data/proposals.json from prepared proposal
sources during the runtime build pipeline.

Collect proposal metadata through a shared catalog so JSON writing and
future prepared-runtime data passes use the same path validation,
preamble parsing, duplicate detection, targeted URL policy, and stable
proposal ordering.

Preserve the existing JSON shape, active repository prefix selection,
pretty formatting, omitted optional fields, and output-collision
protection.
Add build-only Pagefind indexing behind the search module boundary.

Use the Rust Pagefind crate from build-eips so indexing stays inside the
Rust preprocessor. This avoids requiring a separate non-Rust search
toolchain, such as Node.js or Python, and avoids shelling out to a
Pagefind binary that contributors would need to install, version, and
manage on PATH.

Add search config and build CLI controls: a workspace [search] block
with pagefind = true by default, and build --no-search on build and
parity build only. Indexing runs only on build, after Zola has
produced rendered HTML, and writes assets under output/pagefind/.
Running Pagefind from serve or check would require separate
lifecycle support and is out of scope here.

Generate the search route page and search state data used by the theme.
The route page is written into the prepared repo, not the source
worktree, for build, serve, and check so the theme always has a stable
shell; its state is marked disabled on serve, check, and --no-search
builds, and only an enabled build also writes the Pagefind bundle.
Refuse to write the route if user-authored content already occupies
content/search.md or content/search/, so the generated route never
silently overwrites user content.

Keep Pagefind crate imports isolated to src/search/pagefind.rs and
expose the rest through build-eips-owned search types. This keeps the
integration compartmentalized enough to review, maintain, or remove
without intermingling Pagefind API usage through the preprocessor.

Add targeted validation for the route and state contract: route
collision detection against user content, route placement in the
prepared repo under targeted --only builds, resolved base path
behavior including --base-url overrides, the disabled state for
non-build modes and --no-search builds, and the policy that
disabling search does not silently delete previously written
Pagefind assets.
@ritorhymes ritorhymes force-pushed the preprocessor/search-pagefind-foundation branch from abfdc35 to cd7f53d Compare May 26, 2026 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate search options besides fuse.js

1 participant