diff --git a/AGENTS.md b/AGENTS.md index dc0dd041..188dacf2 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -71,5 +71,7 @@ Key points: ## Misc rules -- Version control operations are for humans, not agents. +- Git commits and pushes are for humans, not agents. - No blank lines in functions. +- API endpoint functions should start with their REST verbs, + e.g., `post_something` or `get_something`. diff --git a/LATEX_EDITOR_PLAN.md b/LATEX_EDITOR_PLAN.md new file mode 100644 index 00000000..4942a30a --- /dev/null +++ b/LATEX_EDITOR_PLAN.md @@ -0,0 +1,678 @@ +# LaTeX Editor ("Overleaf replacement") — Implementation Plan + +## Goal + +Let a user open a publication on the publications page, click **Edit**, and get a +full-screen, closable LaTeX editor — an "app within Calkit" — where: + +- The `.tex` source is edited in a code editor (Overleaf-like split view). +- Compilation to PDF happens **client-side in WebAssembly** (no compile server). +- Changes flow back into the project's git repo. We start with **auto-git-commit** + (matching the existing `PUT contents` behavior) and later move to an + **editing-session = branch** model that squashes into a single project commit. +- The feel is collaborative, but the styling is our own (Chakra UI), not an Overleaf clone. +- **Onboarding is as easy as Overleaf**: sign up with Google / email / university SSO, join + a project via a shareable link, and start editing **with no GitHub account** — edits still + land as real commits in the project repo. This requires decoupling identity from GitHub + (see §2) and is as important to the goal as the editor itself. + +We want to reuse compilation code from [TeXlyre](https://github.com/TeXlyre/texlyre) +where it makes sense, but with eyes open about licensing (see below). Calkit is open source +(MIT), which shapes both the licensing and self-hosting choices below. + +--- + +## Decisions log + +Settled during planning (see referenced sections for rationale): + +| Topic | Decision | Ref | +|---|---|---| +| **License** | Path 1 — our own loader around the WASM binaries; copy no TeXlyre source | §0 | +| **TeX engine** | Upstream **busytex/busytex** (MIT, TeX Live 2023 + SyncTeX). The TeXlyre TeX Live 2026 build is AGPL — rejected for Path 1. | §0, Phase 0 | +| **Compile role** | Preview-only; never a pipeline artifact; pipeline stays source of truth | §3.1 | +| **Preview download** | None — preview is view-only in the editor | §3.1 | +| **Git hosting** | GitHub-backed; git-backend abstraction added up front; self-host deferred | §2.4, I4 | +| **Push credential** | Done — existing Calkit GitHub App installation; only authorship routing remains | §2.2 | +| **Onboarding (near-term)** | Google sign-in + email/password signup via invite links (pick password on first sign-in) | §2.3 | +| **GitHub-less users** | Can be **collaborators**, but **cannot own/create projects** until git hosting is decoupled (I4). Owners must have a linked GitHub account. | §2.2, I1 | +| **University SSO** | Deferred | I3 | +| **Sequencing** | I1 + I2 land before/with editor Phase 1; Phase 0 spike runs in parallel | §6 | +| **Phase 3 order** | 3a (real-time collaboration) first, then 3b (sessions-as-branches) | Phase 3 | +| **TeX Live packages** | Upstream server for Phase 0/1; self-hosted cached proxy in Phase 2 | Phase 2 | + +**Open verification tasks (not decisions):** confirm BusyTeX + SwiftLaTeX engine `.wasm` +artifact licenses fit Path 1 redistribution before Phase 1 ships. + +The near-term critical path: **Phase 0 compile spike (BusyTeX)** in parallel with **I1** +(Google/email signup + GitHub-less authorship) and **I2** (native membership + invite +links), then **editor Phase 1** (single-file edit → preview → auto-commit). The detailed +task breakdown for that work is in §8. + +--- + +## 0. The licensing decision (must resolve before writing code) + +This is the single most important gate on the plan. + +| Project | License | Implication | +|---|---|---| +| **calkit-cloud** | **MIT** (`LICENSE`) | Permissive; what we ship today | +| **TeXlyre** | **AGPL-3.0** | Network-copyleft. Linking/integrating its code into our hosted web app would arguably obligate us to release Calkit's source under AGPL. | +| **SwiftLaTeX** (engine TeXlyre uses) | main repo **AGPL-3.0**; engine wrapper files dual **EPL-2.0 / GPL-2.0 w/ Classpath exception** | The on-disk `.wasm` engines derive from TeX Live (mostly permissive/LPPL), but SwiftLaTeX's own loader/wrapper code is copyleft. | + +**Why this matters:** AGPL-3.0 is incompatible with keeping calkit-cloud MIT if we copy +their source into our bundle — and since **Calkit is itself open source (MIT)**, pulling +AGPL code into the tree would force the whole project (or at least the editor) to relicense. +That reinforces Path 1 below (treat the engine as an arms-length binary dependency, write +our own loader). We have three realistic paths: + +1. **Clean-room reuse of the WASM engine binaries only.** Treat the compiled SwiftLaTeX + (`pdftex`/`xetex`) `.wasm` artifacts as a black-box dependency loaded in a Web Worker, + and write *our own* thin TypeScript loader/bridge (do **not** copy TeXlyre's React/TS + source). This is the cleanest separation but still needs the engine's own license + (EPL-2.0/GPL-2.0-classpath) verified as acceptable for redistribution. **Recommended + starting assumption**, pending a real license review. +2. **Use a permissively-licensed engine instead.** Evaluate alternatives whose licensing is + friendlier (e.g. engines distributed under MIT/Apache, or texlive.net-style remote + compile as a fallback). Trade-off: less mature browser story than SwiftLaTeX/BusyTeX. +3. **Accept AGPL for an isolated, separately-licensed sub-package.** Ship the editor as a + distinct AGPL module/micro-frontend with its own LICENSE, loaded at runtime. Legally + fragile for a hosted SaaS; only with counsel sign-off. + +> **DECIDED: Path 1.** We write our own loader/bridge around the WASM binaries and copy +> **no** TeXlyre React/TS source. +> +> **Engine license verification — DONE, and it has teeth (2026-06-17):** +> - `texlyre-busytex` (npm, TeX Live **2026**) is **AGPL-3.0-or-later** — does **not** fit +> Path 1 for an MIT project. Its AGPL covers the TS wrapper + build tooling. +> - Upstream `busytex/busytex` is **MIT** (code/scripts); its published `.wasm`/`.data` +> binaries carry TeX Live/LPPL (permissive) licenses — Path-1-clean — but bundle TeX Live +> **2023**, not 2026. (No npm package; GitHub-releases only.) +> - **Implication:** the license-clean engine is **busytex 2023**, not the TeXlyre 2026 +> build. See the revised engine decision below / in the Decisions log. + +A second gotcha that rides along with the engine choice: **SwiftLaTeX fetches TeX Live +packages on demand from a remote package server at compile time.** We must either +(a) point at SwiftLaTeX's public package server, (b) host our own package repository, or +(c) bundle a fixed TeX Live subset. For reproducibility and uptime we'll likely want our +own cached package endpoint eventually (see Phase 2, "TeX Live package proxy"). + +--- + +## 1. How this fits the existing codebase + +Grounded in the current architecture (researched, not assumed): + +### Frontend (`frontend/`) +- React 18 + TypeScript + Vite, **Chakra UI** components, **TanStack Router** (file-based) + + **TanStack Query**, auto-generated OpenAPI client in `src/client/`. +- Publications live at + `src/routes/_layout/$accountName/$projectName/_layout/publications.tsx` with components in + `src/components/Publications/` (`PublicationView.tsx`, `NewPublication.tsx`, + `ImportOverleaf.tsx`, `PdfAnnotator.tsx`). +- Full-screen modal pattern already exists (Chakra `Modal` with `size="full"`); see + `ArtifactCompareModal.tsx` / `FileViewModal.tsx` for large-modal precedent. +- **No editor/CRDT deps yet** — `codemirror`, `yjs`, `swiftlatex` are all net-new. +- File I/O today goes through the OpenAPI client: `getProjectContents()` (base64 or signed + URL per file), `putProjectContents()` (multipart upload → backend commits & pushes), + `postProjectFsBatchOp()` (batch file ops). History via `getProjectHistory()` / + `getProjectFileHistory()`; refs via `searchProjectRefs()`. + +### Backend (`backend/app/`) +- FastAPI + SQLModel + Postgres; **GitPython** for repo ops (`app/git.py`), DVC integration + in `app/dvc.py`, project/file logic in `app/projects.py` and + `app/api/routes/projects/core.py`. +- Repos are cloned per-user under `/tmp/{github_username}/{owner}/{project}/repo/`, guarded + by `FileLock`. `PUT contents` already does: write file → `git add` → `git commit` → + `git push origin ` (max 1 MB/file). +- **Publications** are entries in `calkit.yaml` (`Publication` model in + `app/models/core.py`): `path`, `title`, `type`, optional DVC `stage`, `storage` + (`git`/`dvc`/`dvc-zip`), and optional `overleaf` sync config. The PDF is usually a DVC + output of a pipeline stage; the `.tex`/`.bib` sources are typically git-tracked. +- Branch support is **read-only today**: refs can be listed and read without checkout, the + working tree always reflects the default branch, and there are **no branch + create/switch/merge endpoints** yet. This is the main backend gap for the + session-as-branch phase. +- Permissions: `get_project()` resolves Read < Write < Admin < Owner. Editing requires + **Write**. + +### What we can reuse vs. build +- **Reuse:** Chakra full-screen modal pattern, OpenAPI file endpoints for the auto-commit + MVP, existing Overleaf sync as a sibling feature, DVC URL resolution for figure assets. +- **Build:** CodeMirror-based editor, WASM compile worker + our loader, virtual filesystem + bridge (repo files ↔ engine FS), and (later) Yjs collaboration + branch/session backend + endpoints. + +--- + +## 2. Identity, onboarding & git hosting + +An Overleaf-grade editor is only as good as its onboarding. The requirement is: a new user +should be able to **sign up with Google / email / their university SSO, click a share link, +and start editing — with no GitHub account** — and their edits must still land as real +commits in the project's repo. This collides head-on with how Calkit works today, so it's a +first-class part of this plan, not an afterthought. + +### 2.1 The starting reality (researched) + +Calkit is currently **deeply GitHub-coupled**: + +- **Login is GitHub-only** in practice. Email/password infra exists (`UserRegister`/ + `UserCreate`, bcrypt, JWT reset tokens) but `POST /users/signup` is intentionally + **disabled (501)**. A `google-auth.tsx` callback route already exists on the frontend. +- **Every `Account` requires a non-null `github_name`** (`app/models/core.py`), used to + derive `github_username`, default repo URLs, and API calls. +- **Every `Project` requires a `git_repo_url` on github.com** (`Project.git_repo_url` is + non-nullable and validated to github.com; the repo is created via the GitHub API at + project-creation time). +- **Write access is derived from GitHub**: `UserProjectAccess` is a *cache* of + `GET /repos/{owner}/{repo}/collaborators/{user}/permission`. There is **no native + collaborators table, and no invite/share-link mechanism** today. +- **Pushes use the requesting user's own GitHub token** via the credential helper in + `app/git.py`; a user with no GitHub token gets a 401 and cannot push. + +Useful foundations already in place to build on: the multi-provider +`UserExternalCredential` table (GitHub/Zenodo/Overleaf/**Google**), bcrypt password support, +refresh tokens, an `Account` abstraction already separate from `User`, and a native +`UserOrgMembership` table (proof we can do native membership without GitHub). + +### 2.2 The key decoupling: two identities in every commit + +The conceptual unlock is separating the two identities bundled into a git push today: + +1. **Authorship** — the `Author:` on the commit (name + email). Costs nothing, needs **no + GitHub account**. A browser editor can author commits as any signed-in Calkit user. +2. **Push credential** — write access to the *remote*. This is the only part that needs a + real token. + +Today both come from one person's GitHub token. If we split them, a GitHub-less contributor +can author commits that are **pushed under a project-level credential**. + +**DECIDED — and the push side is already built.** Calkit already has a **GitHub App +installation** that supplies push access for users with write permission, so the +project-level push credential exists; we don't need to build it. A short-lived, repo-scoped +installation token pushes the commit while the commit is *authored* by the real contributor. + +What remains (in I1/I2) is **not** the credential — it's wiring a **GitHub-less Calkit +identity's authorship** (name + verified email) through that existing App push path, so a +contributor who joined via a share link and has no GitHub token still produces a properly +attributed commit that the App pushes. + +### 2.3 Recommended direction: decouple identity, keep GitHub as the git backend (for now) + +This delivers the full onboarding requirement **without** taking on the risk of self-hosting +git. Changes, roughly in dependency order: + +1. **Turn on real onboarding (the near-term priority).** + - Make creating a Calkit Cloud account dead simple: **"Sign in with Google"** (callback + already stubbed) and **plain email/password signup** (infra exists, just re-enable). + - **Invite-link-driven signup is the primary path:** a project share link lands a new + user on a signup screen where they either continue with Google or **pick a password on + first sign-in**, and are dropped straight into the project. This ties §2.3.3's invite + links to the account-creation flow so onboarding is one continuous motion. + - **University SSO (SAML/OIDC) is deferred** (see I3) — not needed for the initial goal. + When we do it, lean toward buying a broker over hand-rolling SAML (§2.5 caveat). +2. **Make `github_name` optional** on `Account`; mint a Calkit account `name` independent of + GitHub. GitHub becomes *one linkable identity/credential*, not the root of identity. +3. **Native membership + invites.** Add a `ProjectMembership` table (role per user) and a + `ProjectInvitation` / **shareable join-link** table (token, role, expiry, max-uses). + Project access resolves from native membership **first**, with GitHub-collaborator sync + kept as one contributing source. This is what makes "click a link → start editing" + possible for non-GitHub users. +4. **Push via the existing GitHub App installation** (§2.2 — already built). The editor + commits with the contributor's authorship and pushes via the App's installation token; + the only new work is routing a GitHub-less contributor's authorship through that path. +5. **Decouple `git_repo_url` from github.com** behind a small **git-backend abstraction** + (see §2.4) so we're not hard-wired even while GitHub remains the only backend we ship + first. + +Net effect: a student signs in with Google, opens a share link, edits the `.tex` in the WASM +editor, and their commits push to the project's GitHub repo authored as them — no GitHub +account, no friction. + +### 2.4 The bigger bet: self-host git, optionally mirror to GitHub + +You raised hosting repos ourselves. It's attractive (truly breaks the GitHub dependence; no +contributor ever needs GitHub) but it's a large, separable bet: + +- **How:** `dulwich` is *already a dependency* and can serve git smart-HTTP; alternatively + run Gitea/Forgejo. Store repos on our infra (object storage + metadata), optional **push + mirror to GitHub** so GitHub-native users keep their workflow. +- **Costs/risks you flagged are real:** scaling git hosting (packfiles, storage, the + DVC/large-file interplay), migrating existing github.com-backed projects, ops/on-call + burden, and spooking users who *want* their work on GitHub. +- **DECIDED:** don't self-host now. Introduce the **git-backend abstraction (§2.3.5) up + front** so `Project` isn't welded to github.com, ship the **GitHub-backed** implementation + first, and keep **self-hosted git as a pluggable backend** (I4) we can enable later or + per-deployment without re-architecting. + +### 2.5 Open-source implications + +- Calkit being **MIT/open source** reinforces the §0 licensing stance (no AGPL in-tree). +- Cloud-only onboarding features (paid SSO broker, GitHub App credentials, hosted SAML) + must **degrade gracefully in self-hosted OSS builds** — gate them behind config so a + community deployment still works with plain email/password (and, ironically, self-hosted + git is the *most* OSS-friendly backend since it needs no github.com at all). + +### 2.6 New identity workstream (phases) + +These are largely independent of the editor's compile work but **gate its collaborative +value** — non-GitHub contributors can't meaningfully use the editor until I1–I2 land. + +- **I1 — Onboarding & push decoupling:** Google + email signup; `github_name` optional; + route GitHub-less contributor authorship through the **existing** GitHub App push path. + *(Enables: non-GitHub user edits → commit authored as them → pushed by the App.)* +- **I2 — Native membership & share links:** `ProjectMembership` + `ProjectInvitation` + (join links); access checks resolve natively first. *(Enables: "click link → start + editing.")* +- **I3 — University SSO (deferred):** SAML/OIDC, lean toward a broker. Not required for the + initial launch; revisit after I1/I2 prove out the Google + email + invite-link flow. +- **I4 — (optional, bigger) Self-hosted git:** git-backend abstraction + dulwich/Gitea + backend + optional GitHub mirror. + +> **Sequencing — DECIDED:** **I1 + I2 land alongside or before editor Phase 1.** The first +> editor release must be usable by GitHub-less contributors (Google/email signup + invite +> links), since that's the audience the whole feature targets. Editor Phase 0 (the compile +> spike) can proceed in parallel with I1/I2, but Phase 1 does not ship without them. + +--- + +## 3. Target architecture + +``` +┌───────────────────────────────────────────────────────────────────────┐ +│ Publications page ──[Edit]──▶ (Chakra size=full) │ +│ │ +│ ┌─────────────┬───────────────────────┬──────────────────────────┐ │ +│ │ File tree │ CodeMirror editor │ PDF preview (pdf.js) │ │ +│ │ (.tex/.bib │ (LaTeX mode, errors) │ + log / SyncTeX jumps │ │ +│ │ figures) │ │ │ │ +│ └─────────────┴───────────────────────┴──────────────────────────┘ │ +│ │ │ ▲ │ +│ │ ▼ │ │ +│ │ ┌──────────────────────┐ compiled PDF + log │ +│ │ │ Compile Web Worker │───────────┘ │ +│ │ │ (SwiftLaTeX WASM + │ │ +│ │ │ our TS loader) │ │ +│ │ └──────────┬───────────┘ │ +│ │ │ reads/writes │ +│ ▼ ▼ │ +│ ┌──────────────────────────────────────┐ │ +│ │ Virtual FS (in-memory / IndexedDB) │ ◀── seeded from repo │ +│ └──────────────────────────────────────┘ via getProjectContents │ +│ │ │ +│ ▼ save (debounced / on close) │ +│ OpenAPI client ── putProjectContents / fsBatchOp ──▶ backend │ +│ backend: git add/commit/push (auto-commit MVP) │ +└───────────────────────────────────────────────────────────────────────┘ +``` + +Three layers, each independently testable: + +1. **Editor UI** — modal, file tree, CodeMirror, PDF preview. Pure frontend. +2. **Compile core** — Web Worker hosting the WASM engine + a virtual filesystem. No + network at compile time except the TeX Live package fetch. +3. **Persistence** — virtual FS ⇄ project repo. MVP = per-file auto-commit through existing + endpoints; later = session branch + squash. + +### 3.1 Compilation is preview-only — provenance lives in the pipeline + +A load-bearing principle for the whole feature: **the WASM compile exists to support editing +the writing, not to produce artifacts.** A PDF compiled in the browser is a disposable +preview, **never a valid pipeline output**, and carries no provenance. + +- **Source of truth stays the pipeline.** The official, citable PDF is produced by the + project's DVC/pipeline stage and cached as it is today. The editor never writes a PDF into + the repo, DVC, or object storage, and never updates a publication's canonical artifact. +- **Preview PDFs are ephemeral.** They live in the browser (worker memory / IndexedDB cache) + for the editing session and are thrown away. Nothing server-side persists them. +- **No download (DECIDED).** The draft preview is **view-only in the editor** — no download + button. This is the strongest guard against a dirty working copy masquerading as "the + paper." Users who need a file run the pipeline, which produces the official, provenance- + tracked artifact. (Revisit only if there's real demand; if ever added, it would be a + clearly-labeled, commit-named "dirty" file.) +- **UI framing.** The editor surfaces the preview as a "draft preview," visually distinct + from the published artifact shown on the publications page. Regenerating the *official* + PDF remains a pipeline run, never an editor action. + +**Future step (out of scope here): provenance-perfect builds with user compute.** Eventually +we may let a user attach their own compute to a Calkit project and run the real build there +via `calkit`, so an "official" compile is reproducible and fully tracked — the pipeline +produces and caches it, exactly as the canonical path does now. That would be the *only* +sanctioned way to promote a compiled PDF to a real artifact; the in-browser WASM path stays +preview-only regardless. + +--- + +## 4. Phased delivery + +### Phase 0 — Spike & decisions (no production code) +- Resolve the **licensing path** (§0) — blocker for everything else. +- Stand up a throwaway spike: load the **upstream MIT busytex** WASM (TeX Live 2023, SyncTeX + support) in a Web Worker, compile a hello-world `.tex` to PDF entirely in the browser, + render with pdf.js. Confirm bundle size, cold-start time, and where TeX Live packages come + from. (Engine license already verified — §0: MIT busytex is Path-1-clean; the TeXlyre 2026 + build is AGPL and rejected.) +- Decide editor lib (**CodeMirror 6** recommended — it's what TeXlyre uses, lighter than + Monaco, good LaTeX support, and the same core we'd need for Yjs later via `y-codemirror`). +- **Exit criteria:** a documented yes/no on engine + license, and a measured compile of a + real publication `.tex` from one of our projects. + +### Phase 1 — Single-user editor MVP (auto-commit) +Scope: one publication, its primary `.tex`, edit + compile + preview + save. + +**Frontend** +- `Edit` button on the publications page (gated on Write permission) opens + `LatexEditorModal` (Chakra `Modal size="full"`, closable, with unsaved-changes guard). +- `LatexEditor` component: CodeMirror (LaTeX syntax + error squiggles) | PDF preview pane. +- On open: fetch the publication's source file(s) via `getProjectContents()` into the + virtual FS. Initially handle the single `.tex` and any sibling `.bib`. +- "Compile" (manual + debounced auto) posts the FS to the compile worker, renders the + returned PDF, surfaces the log/errors in a collapsible panel. +- "Save": write changed files back via `putProjectContents()` (per-file). Auto-commit on + the backend gives us versioning for free. Debounce + save-on-close. + +**Backend** +- Likely **no new endpoints** for the MVP — reuse `PUT contents`. Possible small additions: + raise/relax the 1 MB limit awareness for `.tex`, and confirm `getProjectContents` returns + raw text suitably for editing. + +**Out of scope for Phase 1:** multi-file projects with `\input`, DVC figures, collaboration, +branches. + +**Exit criteria:** edit a real paper's `.tex`, compile to PDF in-browser, save, and see the +auto-commit land on the default branch with a push to GitHub. + +### Phase 2 — Real projects: multi-file, figures, bib, SyncTeX +- **Virtual FS seeding for a whole publication directory**: resolve all dependencies + (`\input`/`\include`, `\bibliography`, `\includegraphics`). Pull git-tracked sources as + text and **DVC/large binary figures via their signed URLs** (`getProjectContents` already + returns `url` for DVC-stored files; figures may be pipeline outputs). +- **File tree** panel (read + edit text files; figures shown read-only). +- **bibtex/biber + multi-pass** compile orchestration in the worker. +- **SyncTeX** forward/inverse search (click PDF ↔ jump to source) — BusyTeX build supports + SyncTeX; factor into engine choice. +- **TeX Live package proxy** (optional but recommended): a backend route that caches the + packages the engine requests, so compiles are reproducible and don't depend on a + third-party package server. +- **Batch save** via `postProjectFsBatchOp()` to commit multiple changed files in one + commit instead of N. + +**Exit criteria:** a multi-file paper with figures and a `.bib` compiles to the same PDF +the pipeline produces (or close enough), with figures resolved. + +### Phase 3 — Collaboration and/or editing sessions + +Two related but separable upgrades. **Decided order: 3a (real-time collaboration) first**, +then 3b (sessions-as-branches) — collaboration delivers the headline Overleaf feel soonest +and reuses the Phase 1 auto-commit model unchanged. + +**3a. Real-time collaboration (Yjs)** +- Add `yjs` + `y-codemirror.next`. Shared doc, live cursors/selections. +- Needs a sync transport: a **WebSocket relay** (`y-websocket`, server-authoritative — fits + our hosted model better than TeXlyre's P2P WebRTC) or WebRTC w/ a signaling server. + Server-authoritative is the recommended fit for Calkit since we already have a backend. +- Presence/awareness UI in Chakra. Persistence of the live doc (Redis/Postgres or a doc + service) is the main new infra. + +**3b. Editing sessions = branches (squash on finish)** +This is the bigger backend lift because branch *writes* don't exist yet. +- New backend capability in `app/git.py` / projects routes: + - Start session → create branch `editor-session/` from default, check it out in the + per-user repo clone. + - Commit edits to the session branch (auto-commit, frequent, cheap). + - Finish session → **squash-merge** the branch into the default branch as a single, + well-described commit; delete the session branch. Handle conflicts/abort. + - Discard session → delete branch, no merge. +- Data model: a `EditingSession` (or reuse `FileLock`-style table) tracking branch name, + owner, publication path, status, base commit. +- Frontend: session lifecycle UI (start/resume/finish/discard), "draft" vs "published" + state, and a diff/review of the squash before it merges (we already have + `ArtifactCompareModal` + `react-diff-viewer` to lean on). +- Interaction with the current "working tree = default branch" assumption and the per-user + `/tmp` clone model needs care — concurrent sessions and the existing `PUT contents` + auto-commit path must not stomp each other. + +**Exit criteria (3b):** start a session, make several edits/compiles, finish → exactly one +clean commit on the default branch; discard → no trace. + +--- + +## 5. Key technical risks & open questions + +1. **License (§0)** — gating. Resolve first. +2. **TeX Live package delivery** — on-demand fetch vs. self-hosted proxy vs. bundled subset. + Affects reproducibility, offline, and cold-start latency. +3. **Bundle size / cold start** — WASM engines are large (tens of MB). Lazy-load the worker + only when the editor opens; cache aggressively (IndexedDB / service worker). +4. **Figure & big-file handling** — DVC outputs are large and may be pipeline-generated. + Pull read-only via signed URLs; don't try to round-trip them through the editor. +5. **Fidelity vs. the pipeline build** — in-browser compile may differ from the project's + canonical (Docker/DVC stage) build. This is acceptable *because* the WASM compile is + preview-only and the pipeline stays source of truth (see §3.1) — but the UI must make the + draft-vs-published distinction obvious so the difference never causes confusion. +6. **Branch-write model (Phase 3b)** — net-new backend surface; concurrency with the + existing per-user clone + auto-commit path is the trickiest part. +7. **Concurrent edits before Yjs** — Phase 1/2 are single-writer; reuse `FileLock` to avoid + two users (or the editor + Overleaf sync) clobbering each other. +8. **Relationship to existing Overleaf import/sync** — is this editor a *replacement* for + that flow, or complementary? Affects whether we keep `ImportOverleaf` prominent. +9. **Identity decoupling (§2)** — making `github_name` optional and resolving access from a + native membership table touches core auth; risk of regressing existing GitHub-derived + permissions. Needs careful migration + keeping GitHub-collaborator sync working. +10. **Push attribution & abuse** — pushing GitHub-less contributors' commits under a GitHub + App token means commit *authorship* is only as trustworthy as our auth; verify emails, + and rate-limit/scope join-link roles to avoid a share link becoming a write-access leak. +11. **SSO build-vs-buy & OSS degradation (§2.5)** — a paid SSO broker is the pragmatic path + for university IdPs but must not become a hard dependency for self-hosted OSS builds. + +--- + +## 6. Rough sequencing / sizing + +Two interleaved tracks — **Editor** (compile/UX) and **Identity** (onboarding/git): + +| Phase | Track | Outcome | Relative size | +|---|---|---|---| +| 0 | Editor | License decision + WASM compile spike | Small (but blocking) | +| I1 | Identity | Google + email signup; `github_name` optional; GitHub App push credential | Medium | +| I2 | Identity | Native `ProjectMembership` + shareable join links; native access checks | Medium | +| 1 | Editor | Single-file editor modal, compile, auto-commit save | Medium | +| 2 | Editor | Multi-file, figures, bib, SyncTeX, batch save, package proxy | Large | +| 3a | Editor | Real-time collaboration (Yjs + WS relay) | Large | +| 3b | Editor | Editing sessions as branches w/ squash-merge | Large (backend-heavy) | +| I3 | Identity | (Deferred) University SSO (SAML/OIDC via broker) | Medium | +| I4 | Identity | (Optional) Self-hosted git backend + GitHub mirror | Large (infra-heavy) | + +**Decided ordering:** Editor Phase 0 (compile spike) runs in parallel with **I1 + I2, which +land before/with editor Phase 1** — the first editor release must serve GitHub-less +contributors. Editor phases 1–2 then deliver "edit & preview in the browser, versioned in +git." Editor Phase 3 is where it becomes truly collaborative; 3a and 3b can ship in either +order — **3a (collaboration) first**. I3 (university SSO) and I4 (self-hosted git) are both +deferred. + +--- + +## 7. Proposed first concrete steps + +1. Phase 0 spike in a scratch branch: **BusyTeX** WASM in a Web Worker compiling a real + project `.tex` → PDF in browser; measure & document. (See §8.1 for the full task list.) +2. Add `codemirror`, pdf.js (already partly present via `pdfjs-dist`), and the engine + artifact to `frontend/`; scaffold `src/components/Publications/LatexEditor/`. +3. Wire the `Edit` button + full-screen modal shell with file-load and save stubs. +4. Land the manual-compile + auto-commit MVP behind a feature flag. + +The fully decomposed task breakdown for the near-term critical path (Phase 0 + I1 + I2 + +editor Phase 1) is in **§8**. + +--- + +### Open questions for review +- ~~Licensing path~~ — **DECIDED: Path 1** (§0). Only follow-up: verify the engine `.wasm` + artifact licenses before Phase 1. +- ~~Engine choice~~ — **DECIDED: upstream MIT `busytex/busytex`** (TeX Live 2023 + SyncTeX). + License verified Path-1-clean; the TeXlyre TeX Live 2026 build is AGPL and was rejected. +- ~~Git hosting strategy~~ — **DECIDED: GitHub-backed, with the git-backend abstraction + (§2.4) introduced up front.** Self-hosted git stays a pluggable backend deferred to I4. +- ~~Push credential~~ — **DECIDED/DONE: existing Calkit GitHub App installation** supplies + push for write-access users (§2.2). Remaining work is authorship routing, not credentials. +- ~~University SSO~~ — **DECIDED: deferred (I3).** Near-term onboarding = Google sign-in + + email/password signup via invite links (pick password on first sign-in). Broker-vs-build + revisited when we actually start I3. +- ~~I1+I2 sequencing~~ — **DECIDED: I1 + I2 land before/with editor Phase 1**; Phase 0 + spike runs in parallel. +- ~~Phase 3 priority~~ — **DECIDED: 3a (real-time collaboration) first**, then 3b. +- ~~Preview download~~ — **DECIDED: no download; preview is view-only** (§3.1). +- ~~TeX Live package server~~ — **DECIDED: defer.** Use the upstream/public package server + for Phase 0/1; self-host a cached package proxy in Phase 2 (§ Phase 2) for reproducibility + and uptime. + +--- + +## 8. Task breakdown — near-term critical path + +Four workstreams. **§8.1 (Phase 0 spike)** can start immediately and run in parallel with +**§8.2 (I1)** and **§8.3 (I2)**; **§8.4 (editor Phase 1)** depends on all three. File paths +are the current locations found during research — verify before editing. + +### 8.1 Phase 0 — BusyTeX compile spike (throwaway, scratch branch) + +Goal: prove an in-browser compile of a real Calkit paper before committing to UI work. + +**STATUS: DONE — verdict GO.** Spike lives in `spikes/latex-wasm-busytex/` (throwaway; +binaries gitignored, re-fetch via `download-assets.sh`). Verified headless in Chrome. + +- [x] ~~Obtain artifact + confirm license fits Path 1.~~ **Done (§0):** upstream MIT + `busytex/busytex` (TeX Live 2023); TeXlyre's 2026 build is AGPL and rejected. +- [x] ~~Page loads engine in a Web Worker, compiles a hello-world `.tex` to PDF, renders it.~~ + Our own loader (`main.js`) around the MIT busytex worker; PDF shown via blob-URL iframe. +- [x] ~~Compile a real-ish `.tex` with packages.~~ article + `amsmath`/`graphicx`/`hyperref` + → 121.6 KB PDF, `exit_code 0`, all packages resolved from the `texlive-basic` bundle. + *(Follow-up: try a heavier real paper from an actual project in §8.4.)* +- [x] ~~Measure & document.~~ **Cold-start ~1.5–1.8 s, compile ~0.4 s, total ~1.9–2.2 s.** + Asset size dominates: `busytex.wasm` ≈ 29 MB + `texlive-basic.data` ≈ 100 MB one-time. +- [ ] FS-seed contract decision (how files reach the worker) — carried into §8.4 (the + `{path, contents}[]` shape busytex expects maps cleanly onto `getProjectContents`). +- **Exit met:** GO with numbers + a working compile. **Key productionization takeaway:** the + ~130 MB one-time asset download — not compile speed — is the cost to manage (lazy-load + when the editor opens; cache in IndexedDB / service worker). + +### 8.2 I1 — Onboarding & GitHub-less authorship + +Goal: a user can create a Calkit Cloud account without GitHub, and their edits can be +authored as them and pushed via the existing GitHub App. + +**Constraint (from Pete):** GitHub-less users may **collaborate** but **cannot own/create +projects** until git hosting is decoupled (I4). Owners must have a linked GitHub account. + +**Backend** +- [x] ~~Re-enable signup~~ — `POST /users/signup` now creates an email/password user + (bcrypt) with no GitHub account. (`app/api/routes/users.py`) +- [x] ~~Make `Account.github_name` nullable + migration.~~ Column nullable + (`app/models/core.py`); migration `f3a9c1d2b4e6_make_account_github_name_nullable` + (applies cleanly to head). `create_user` no longer forces a `github_name`; None-safe + types on `User.github_username` / `UserPublic` / `AccountPublic` and the derived + comment/file-lock props; **invariant guards** keep `Org.github_name` / + `Project.owner_github_name` typed `str` (owners/orgs always have one). +- [x] ~~Owner guard.~~ `post_project` returns **403** "A linked GitHub account is required to + create or own projects" for GitHub-less users. Tests: GitHub-less signup + owner-guard + added; **full backend suite green (89 passed, 4 skipped)**. +- [ ] Finalize **Google sign-in** on the backend (token exchange + user/account creation), + pairing with the existing `google-auth.tsx` callback; store identity via + `UserExternalCredential` (provider=google). +- [x] ~~**Authorship routing.**~~ `get_repo` now uses a **GitHub App installation token** + for GitHub-less users (`github.get_app_installation_token(owner, repo)` mints a + repo-scoped token via the App JWT) instead of a personal token; GitHub users are + unchanged. `_configure_committer` already authors as the Calkit user (name = full_name + / email, email = `user.email`) and is None-safe; `get_repo`'s temp-path now falls back + to `account.name` when `github_username` is None. Unit-tested the token exchange (mocked + GitHub API) + 502 handling. **Caveat:** the live App-token network path and the + GitHub-less clone/push round-trip can't be exercised locally (needs the App private key + + real repo) — verify on staging. Email **verification** for authored commits is still + a TODO (currently trusts the signup email). + +**Frontend** +- [ ] Login/signup UI (`src/routes/login/`): add "Continue with Google" + email/password + sign-up alongside the existing GitHub button. +- [ ] Wire the Google callback (`src/routes/google-auth.tsx`) end-to-end through `lib/auth.ts` + token storage. + +- **Exit:** create an account via Google and via email/password (no GitHub); make an edit + through an existing write path and see a commit authored as the Calkit user, pushed by the + App. + +### 8.3 I2 — Native membership & shareable invite links + +Goal: project access resolves from a native table first, and a share link lets a new user +join and start editing. + +**Backend — DONE (full suite green, 92 passed / 4 skipped).** +- [x] ~~`ProjectMembership` table.~~ `(user_id, project_id, role_id)` mirroring + `UserOrgMembership`, with `role_name` computed (`app/models/core.py`). +- [x] ~~`ProjectInvitation` table + endpoints.~~ Token is high-entropy + (`generate_refresh_token`), only its **SHA-256 hash** is stored; `role_id`, `expires`, + `max_uses`, `use_count`, `revoked`, `is_valid`. Endpoints: **create** / **list** / + **revoke** (admin-only) + **redeem** (`POST /project-invitations/{token}` → creates + membership; 410 if revoked/expired/used-up; owners aren't downgraded). Migration + `b7e2f4a1c9d8_add_project_membership_and_invitations` (FK cascades; applies to head). +- [x] ~~Access resolution.~~ `get_project` now checks native `ProjectMembership` **first** + in the collaborator branch, falling back to the GitHub-derived `UserProjectAccess` + cache (extracted into `_resolve_github_collaborator_access`). GitHub access unchanged. +- [x] ~~Role-escalation guard.~~ Invites cap at **admin** (never owner); create/list/revoke + require admin access. Tests cover create+redeem→access, admin-only, revoked→410. +- [x] ~~**Repo-write for native members.**~~ Done via the §8.2 authorship-routing work — a + GitHub-less `write` member's git operations run through the App installation token, + authored as them. (Live push verification deferred to staging — see §8.2 caveat.) + +**Frontend** +- [x] ~~"Invite / share" UI.~~ Built on the **Collaborators page** (per Pete) — + `components/Projects/InviteLinks.tsx`: create-link modal (role + optional expiry/ + max-uses), one-time link reveal with copy, list with status/uses/expiry + revoke. + Client regenerated (`make client`); tsc + biome clean. *(Admin-facing — works for + existing GitHub users; not blocked by the `_layout` gate below.)* +- [ ] Invite landing route (`/join/{token}`): unauthenticated visitor → signup (§8.2) → + auto-redeem → land in the project. +- [ ] **Email/password + Google signup UI** (§8.2 frontend) — the consumer entry point. +- [ ] ⚠️ **BLOCKER for the GitHub-less consumer flow:** `src/routes/_layout.tsx` forces + **every** authenticated user to have the Calkit GitHub App installed (queries + `getUserGithubAppInstallations`; redirects to install, or logs out on the GitHub API + error a tokenless user hits). A GitHub-less user currently **cannot enter any + `_layout` route**. This gate must be relaxed for GitHub-less users before signup/join + can actually land someone in a project. Core-behavior change — worth a design pass. + +- **Exit (backend met):** a GitHub-less user signs up, redeems an invite, and gains native + access to a private project (verified by test). Remaining for full exit: the frontend + flow + repo-write authorship routing. + +### 8.4 Editor Phase 1 — single-file editor MVP (depends on 8.1–8.3) + +Goal: open a publication, edit its `.tex`, compile-preview in-browser, save via auto-commit. + +**Frontend** +- [ ] Add deps: `codemirror` (v6) + LaTeX language support; reuse `pdfjs-dist`. Package the + BusyTeX engine artifact (lazy-loaded only when the editor opens). +- [ ] Scaffold `src/components/Publications/LatexEditor/`: full-screen Chakra `Modal` + (`size="full"`, closable, unsaved-changes guard) following `ArtifactCompareModal` / + `FileViewModal` precedent. +- [ ] **Edit** button on the publications page + (`src/routes/_layout/$accountName/$projectName/_layout/publications.tsx`), gated on + Write permission. +- [ ] On open: fetch the publication's `.tex` (+ sibling `.bib`) via + `ProjectsService.getProjectContents()` into the worker's virtual FS. +- [ ] CodeMirror editor pane | PDF preview pane; manual + debounced auto compile through the + §8.1 worker; collapsible log/error panel. **View-only preview, no download** (§3.1). +- [ ] Save via `putProjectContents()` (per-file) with debounce + save-on-close; auto-commit + handled by the backend. +- [ ] Feature-flag the whole entry point. + +**Backend** +- [ ] Likely no new endpoints — reuse `PUT contents` + (`backend/app/api/routes/projects/core.py` ~lines 1138–1185). Confirm `.tex` round-trips + as raw text and re-check the 1 MB file-size limit for typical sources. + +- **Exit:** edit a real paper's `.tex`, compile to PDF in-browser, save, and see the + auto-commit land + push to GitHub — as a non-GitHub user who joined via an invite link. + +**All open questions resolved.** Remaining follow-ups are verification tasks, not decisions: +confirm the BusyTeX + SwiftLaTeX engine artifact licenses fit Path 1 before Phase 1 ships. diff --git a/backend/app/alembic/versions/b7e2f4a1c9d8_add_project_membership_and_invitations.py b/backend/app/alembic/versions/b7e2f4a1c9d8_add_project_membership_and_invitations.py new file mode 100644 index 00000000..575e6a08 --- /dev/null +++ b/backend/app/alembic/versions/b7e2f4a1c9d8_add_project_membership_and_invitations.py @@ -0,0 +1,87 @@ +"""Add project membership and invitations + +Native (non-GitHub) project membership plus shareable invite links, so users +without GitHub accounts can be granted collaborator access. + +Revision ID: b7e2f4a1c9d8 +Revises: f3a9c1d2b4e6 +Create Date: 2026-06-17 00:00:00.000000 + +""" + +from alembic import op +import sqlalchemy as sa +import sqlmodel.sql.sqltypes + + +# revision identifiers, used by Alembic. +revision = "b7e2f4a1c9d8" +down_revision = "f3a9c1d2b4e6" +branch_labels = None +depends_on = None + + +def upgrade(): + op.create_table( + "projectmembership", + sa.Column("user_id", sa.Uuid(), nullable=False), + sa.Column("project_id", sa.Uuid(), nullable=False), + sa.Column("role_id", sa.Integer(), nullable=False), + sa.Column("created", sa.DateTime(), nullable=False), + sa.Column( + "updated", + sa.DateTime(), + server_default=sa.func.now(), + nullable=False, + ), + sa.Column("invited_by_user_id", sa.Uuid(), nullable=True), + sa.ForeignKeyConstraint( + ["user_id"], ["user.id"], ondelete="CASCADE" + ), + sa.ForeignKeyConstraint( + ["project_id"], ["project.id"], ondelete="CASCADE" + ), + sa.ForeignKeyConstraint( + ["invited_by_user_id"], ["user.id"], ondelete="SET NULL" + ), + sa.PrimaryKeyConstraint("user_id", "project_id"), + ) + op.create_table( + "projectinvitation", + sa.Column("id", sa.Uuid(), nullable=False), + sa.Column("project_id", sa.Uuid(), nullable=False), + sa.Column( + "token_hash", + sqlmodel.sql.sqltypes.AutoString(), + nullable=False, + ), + sa.Column("role_id", sa.Integer(), nullable=False), + sa.Column("created_by_user_id", sa.Uuid(), nullable=True), + sa.Column("created", sa.DateTime(), nullable=False), + sa.Column("expires", sa.DateTime(), nullable=True), + sa.Column("max_uses", sa.Integer(), nullable=True), + sa.Column("use_count", sa.Integer(), nullable=False), + sa.Column("revoked", sa.Boolean(), nullable=False), + sa.ForeignKeyConstraint( + ["project_id"], ["project.id"], ondelete="CASCADE" + ), + sa.ForeignKeyConstraint( + ["created_by_user_id"], ["user.id"], ondelete="SET NULL" + ), + sa.PrimaryKeyConstraint("id"), + ) + op.create_index( + op.f("ix_projectinvitation_token_hash"), + "projectinvitation", + ["token_hash"], + unique=True, + ) + + +def downgrade(): + op.drop_index( + op.f("ix_projectinvitation_token_hash"), + table_name="projectinvitation", + ) + op.drop_table("projectinvitation") + op.drop_table("projectmembership") diff --git a/backend/app/alembic/versions/f3a9c1d2b4e6_make_account_github_name_nullable.py b/backend/app/alembic/versions/f3a9c1d2b4e6_make_account_github_name_nullable.py new file mode 100644 index 00000000..546213fe --- /dev/null +++ b/backend/app/alembic/versions/f3a9c1d2b4e6_make_account_github_name_nullable.py @@ -0,0 +1,35 @@ +"""Make account github_name nullable + +Allows accounts created without GitHub (email/Google signup). Project owners +must still have a github_name (enforced in the app layer) until git hosting is +decoupled from GitHub; collaborators need not. + +Revision ID: f3a9c1d2b4e6 +Revises: dcef842dee10 +Create Date: 2026-06-17 00:00:00.000000 + +""" + +from alembic import op +import sqlalchemy as sa + + +# revision identifiers, used by Alembic. +revision = "f3a9c1d2b4e6" +down_revision = "dcef842dee10" +branch_labels = None +depends_on = None + + +def upgrade(): + op.alter_column( + "account", "github_name", existing_type=sa.VARCHAR(), nullable=True + ) + + +def downgrade(): + # Note: rows with NULL github_name (GitHub-less accounts) must be handled + # before downgrading, or this will fail. + op.alter_column( + "account", "github_name", existing_type=sa.VARCHAR(), nullable=False + ) diff --git a/backend/app/api/routes/accounts.py b/backend/app/api/routes/accounts.py index c83a2d3b..94bdc800 100644 --- a/backend/app/api/routes/accounts.py +++ b/backend/app/api/routes/accounts.py @@ -17,7 +17,7 @@ class AccountPublic(SQLModel): name: str - github_name: str + github_name: str | None display_name: str kind: Literal["user", "org"] role: Literal["self", "read", "write", "admin", "owner"] | None = None diff --git a/backend/app/api/routes/projects/core.py b/backend/app/api/routes/projects/core.py index d9d70dc5..9ff1c813 100644 --- a/backend/app/api/routes/projects/core.py +++ b/backend/app/api/routes/projects/core.py @@ -9,7 +9,7 @@ import uuid import zipfile from copy import deepcopy -from datetime import datetime +from datetime import datetime, timedelta from fnmatch import fnmatch from io import StringIO from pathlib import Path @@ -48,6 +48,7 @@ ) from app.api.routes.orgs import OrgPost, post_org from app.config import settings +from app.security import generate_refresh_token, hash_refresh_token from app.core import ( CATEGORIES_PLURAL_TO_SINGULAR, CATEGORIES_SINGULAR_TO_PLURAL, @@ -88,6 +89,12 @@ ProjectComment, ProjectCommentPatch, ProjectCommentPost, + ProjectInvitation, + ProjectInvitationCreated, + ProjectInvitationPost, + ProjectInvitationPublic, + ProjectInvitationRedeemed, + ProjectMembership, ProjectPost, ProjectPublic, ProjectsPublic, @@ -98,6 +105,7 @@ UserOrgMembership, UserProjectAccess, ) +from app.models.core import ROLE_IDS from app.models.projects import ( Showcase, ShowcaseFigure, @@ -269,6 +277,13 @@ def post_project( project_in: ProjectPost, ) -> ProjectPublic: """Create new project.""" + # Project owners must have a linked GitHub account until git hosting is + # decoupled from GitHub. GitHub-less users can still collaborate. + if current_user.account.github_name is None: + raise HTTPException( + 403, + "A linked GitHub account is required to create or own projects.", + ) project_in.name = project_in.name.lower() if project_in.git_repo_exists and project_in.git_repo_url is None: raise HTTPException( @@ -3787,6 +3802,159 @@ def delete_project_collaborator( return Message(message="Success") +@router.post("/projects/{owner_name}/{project_name}/invitations") +def post_project_invitation( + owner_name: str, + project_name: str, + req: ProjectInvitationPost, + current_user: CurrentUser, + session: SessionDep, +) -> ProjectInvitationCreated: + """Create a shareable invite link granting native project membership. + + The raw token is returned only here; the DB stores its hash. Invites can + grant up to admin, never ownership. + """ + project = app.projects.get_project( + owner_name=owner_name, + project_name=project_name, + session=session, + current_user=current_user, + min_access_level="admin", + ) + token = generate_refresh_token() + expires = ( + utcnow() + timedelta(days=req.expires_days) + if req.expires_days is not None + else None + ) + invitation = ProjectInvitation( + project_id=project.id, + token_hash=hash_refresh_token(token), + role_id=ROLE_IDS[req.role], + created_by_user_id=current_user.id, + expires=expires, + max_uses=req.max_uses, + ) + session.add(invitation) + session.commit() + session.refresh(invitation) + url = f"{settings.frontend_host.rstrip('/')}/join/{token}" + return ProjectInvitationCreated( + id=invitation.id, + role_name=invitation.role_name, + created=invitation.created, + expires=invitation.expires, + max_uses=invitation.max_uses, + use_count=invitation.use_count, + revoked=invitation.revoked, + token=token, + url=url, + ) + + +@router.get("/projects/{owner_name}/{project_name}/invitations") +def get_project_invitations( + owner_name: str, + project_name: str, + current_user: CurrentUser, + session: SessionDep, +) -> list[ProjectInvitationPublic]: + project = app.projects.get_project( + owner_name=owner_name, + project_name=project_name, + session=session, + current_user=current_user, + min_access_level="admin", + ) + invitations = session.exec( + select(ProjectInvitation) + .where(ProjectInvitation.project_id == project.id) + .order_by(sqlalchemy.desc(ProjectInvitation.created)) # type: ignore + ).all() + return list(invitations) # type: ignore[return-value] + + +@router.delete( + "/projects/{owner_name}/{project_name}/invitations/{invitation_id}" +) +def delete_project_invitation( + owner_name: str, + project_name: str, + invitation_id: uuid.UUID, + current_user: CurrentUser, + session: SessionDep, +) -> Message: + project = app.projects.get_project( + owner_name=owner_name, + project_name=project_name, + session=session, + current_user=current_user, + min_access_level="admin", + ) + invitation = session.get(ProjectInvitation, invitation_id) + if invitation is None or invitation.project_id != project.id: + raise HTTPException(404, "Invitation not found") + invitation.revoked = True + session.add(invitation) + session.commit() + return Message(message="Invitation revoked") + + +@router.post("/project-invitations/{token}") +def post_project_invitation_redemption( + token: str, + current_user: CurrentUser, + session: SessionDep, +) -> ProjectInvitationRedeemed: + """Redeem an invite link, granting the current user native membership.""" + invitation = session.exec( + select(ProjectInvitation).where( + ProjectInvitation.token_hash == hash_refresh_token(token) + ) + ).first() + if invitation is None: + raise HTTPException(404, "Invitation not found") + if not invitation.is_valid: + raise HTTPException(410, "Invitation is no longer valid") + project = session.get(Project, invitation.project_id) + if project is None: + raise HTTPException(404, "Project not found") + # Project owners already have full access; don't create a lesser membership. + if project.owner_account.user_id == current_user.id: + return ProjectInvitationRedeemed( + owner_name=project.owner_account.name, + project_name=project.name, + role_name="owner", + ) + existing = session.exec( + select(ProjectMembership) + .where(ProjectMembership.project_id == project.id) + .where(ProjectMembership.user_id == current_user.id) + ).first() + if existing is None: + session.add( + ProjectMembership( + user_id=current_user.id, + project_id=project.id, + role_id=invitation.role_id, + invited_by_user_id=invitation.created_by_user_id, + ) + ) + elif invitation.role_id > existing.role_id: + # Upgrade if the invite grants more than they already have. + existing.role_id = invitation.role_id + session.add(existing) + invitation.use_count += 1 + session.add(invitation) + session.commit() + return ProjectInvitationRedeemed( + owner_name=project.owner_account.name, + project_name=project.name, + role_name=invitation.role_name, + ) + + class Issue(BaseModel): id: int number: int diff --git a/backend/app/api/routes/users.py b/backend/app/api/routes/users.py index e03e014d..432a85ef 100644 --- a/backend/app/api/routes/users.py +++ b/backend/app/api/routes/users.py @@ -161,8 +161,11 @@ def delete_current_user( @router.post("/users/signup") def register_user(session: SessionDep, user_in: UserRegister) -> UserPublic: - """Create new user without the need to be logged in.""" - raise HTTPException(501) + """Create a new user with email + password, without a GitHub account. + + Such users can collaborate on projects (e.g. via invite links) but cannot + own projects until git hosting is decoupled from GitHub. + """ user = users.get_user_by_email(session=session, email=user_in.email) if user: raise HTTPException( diff --git a/backend/app/git.py b/backend/app/git.py index 42baea88..1b5058a0 100644 --- a/backend/app/git.py +++ b/backend/app/git.py @@ -20,7 +20,7 @@ from git.exc import GitCommandError from sqlmodel import Session -from app import users +from app import github, users from app.core import logger, ryaml from app.models import GitRef, Project, User @@ -146,7 +146,10 @@ def get_repo( # Add the file to the repo(s) -- we may need to clone it. # Ref-based reads should not mutate this working tree checkout. if user is not None: - base_dir = f"/tmp/{user.github_username}/{owner_name}/{project_name}" + # github_username is None for GitHub-less users; fall back to the + # (always-present, unique) account name for a stable temp path. + user_dir = user.github_username or user.account.name + base_dir = f"/tmp/{user_dir}/{owner_name}/{project_name}" else: base_dir = f"/tmp/anonymous/{owner_name}/{project_name}" repo_dir = os.path.join(base_dir, "repo") @@ -160,9 +163,24 @@ def get_repo( # Clone the repo if it doesn't exist -- it will be in a "repo" dir access_token: str | None = None if user is not None: - logger.info(f"Getting {user.email}'s access token for Git operations") - with _timed("get-github-token", user=user.github_username): - access_token = users.get_github_token(session=session, user=user) + if user.account.github_name is not None: + # GitHub user: operate with their personal token. + logger.info(f"Getting {user.email}'s token for Git operations") + with _timed("get-github-token", user=user.github_username): + access_token = users.get_github_token( + session=session, user=user + ) + else: + # GitHub-less member: access was authorized natively upstream + # (e.g. via an invite). Operate via the GitHub App installation + # token for the repo; commits are still authored as this user. + logger.info( + f"Getting GitHub App installation token for {user.email}" + ) + with _timed("get-app-installation-token", user=user.email): + access_token = github.get_app_installation_token( + owner_name, project_name + ) # Plain URL with no embedded token -- credentials handled in helper git_plain_url = project.git_repo_url if not git_plain_url.endswith(".git"): diff --git a/backend/app/github.py b/backend/app/github.py index b2188a78..6f46ac93 100644 --- a/backend/app/github.py +++ b/backend/app/github.py @@ -4,6 +4,8 @@ import time import jwt +import requests +from fastapi import HTTPException def create_app_token() -> str: @@ -25,6 +27,43 @@ def create_app_token() -> str: return encoded_jwt +def get_app_installation_token(owner_name: str, repo_name: str) -> str: + """Mint a GitHub App installation access token scoped to one repo. + + Used to perform git operations on behalf of users who have native Calkit + access to a project but no personal GitHub token (e.g. email/Google + signups). The caller must have authorized the user's access first. + """ + app_jwt = create_app_token() + headers = { + "Authorization": f"Bearer {app_jwt}", + "Accept": "application/vnd.github+json", + } + resp = requests.get( + f"https://api.github.com/repos/{owner_name}/{repo_name}/installation", + headers=headers, + timeout=15, + ) + if resp.status_code != 200: + raise HTTPException( + 502, + "Could not find the Calkit GitHub App installation for this repo", + ) + installation_id = resp.json()["id"] + resp = requests.post( + f"https://api.github.com/app/installations/{installation_id}" + "/access_tokens", + headers=headers, + json={"repositories": [repo_name]}, + timeout=15, + ) + if resp.status_code not in (200, 201): + raise HTTPException( + 502, "Could not mint a GitHub App installation token" + ) + return resp.json()["token"] + + def token_resp_text_to_dict(resp_text: str) -> dict: items = resp_text.split("&") out = {} diff --git a/backend/app/models/core.py b/backend/app/models/core.py index 462501a5..e94eb1df 100644 --- a/backend/app/models/core.py +++ b/backend/app/models/core.py @@ -32,7 +32,10 @@ class Account(SQLModel, table=True): org_id: uuid.UUID | None = Field( default=None, foreign_key="org.id", nullable=True ) - github_name: str + # Null for accounts created without GitHub (email/Google signup). Project + # owners must still have a github_name until git hosting is decoupled from + # GitHub (see LATEX_EDITOR_PLAN.md I4); collaborators need not. + github_name: str | None = Field(default=None) # Relationships owned_projects: list["Project"] = Relationship( back_populates="owner_account", @@ -64,7 +67,7 @@ class UserBase(SQLModel): class UserCreate(UserBase): password: str = Field(min_length=8, max_length=40) account_name: str | None = Field(default=None, max_length=64) - github_username: str = Field(default=None, max_length=64) + github_username: str | None = Field(default=None, max_length=64) class UserRegister(SQLModel): @@ -216,7 +219,7 @@ class User(UserBase, table=True): @computed_field @property - def github_username(self) -> str: + def github_username(self) -> str | None: return self.account.github_name @property @@ -235,7 +238,7 @@ def get_external_credential( # Properties to return via API, id is always required class UserPublic(UserBase): id: uuid.UUID - github_username: str + github_username: str | None subscription: Union["UserSubscription", None] @@ -261,6 +264,9 @@ def display_name(self) -> str: @computed_field @property def github_name(self) -> str: + # Orgs are always created with a GitHub name. + if self.account.github_name is None: + raise ValueError("Org account has no github_name") return self.account.github_name @property @@ -531,6 +537,13 @@ class Project(ProjectBase, table=True): user_access_records: list["UserProjectAccess"] = Relationship( back_populates="project", cascade_delete=True ) + # Native (non-GitHub) project membership and shareable invite links. + memberships: list["ProjectMembership"] = Relationship( + back_populates="project", cascade_delete=True + ) + invitations: list["ProjectInvitation"] = Relationship( + back_populates="project", cascade_delete=True + ) # TODO: Figure out how to do self-referential relationships with parent # and children projects questions: list["Question"] = Relationship( @@ -567,6 +580,10 @@ def owner_account_type(self) -> str: @computed_field @property def owner_github_name(self) -> str: + # Project owners must have a GitHub account (collaborators need not) + # until git hosting is decoupled from GitHub. + if self.owner_account.github_name is None: + raise ValueError("Project owner account has no github_name") return self.owner_account.github_name @property @@ -642,6 +659,101 @@ class UserProjectAccess(SQLModel, table=True): project: Project = Relationship(back_populates="user_access_records") +class ProjectMembership(SQLModel, table=True): + """Native (non-GitHub) membership of a user in a project. + + Resolved before GitHub-derived access in ``get_project``, and the only + access path for GitHub-less collaborators. + """ + + user_id: uuid.UUID = Field(foreign_key="user.id", primary_key=True) + project_id: uuid.UUID = Field(foreign_key="project.id", primary_key=True) + # Membership cannot grant ownership; capped at admin by the API. + role_id: int = Field(ge=min(ROLE_IDS.values()), le=max(ROLE_IDS.values())) + created: datetime = Field(default_factory=utcnow) + updated: datetime = Field( + default_factory=utcnow, + sa_column_kwargs=dict( + server_onupdate=sqlalchemy.func.now(), + server_default=sqlalchemy.func.now(), + ), + ) + invited_by_user_id: uuid.UUID | None = Field( + default=None, foreign_key="user.id" + ) + # Relationships (no User relationship: two user FKs would be ambiguous) + project: Project = Relationship(back_populates="memberships") + + @computed_field + @property + def role_name(self) -> str: + return ROLE_NAMES[self.role_id] + + +class ProjectInvitation(SQLModel, table=True): + """A shareable invite link granting project membership when redeemed.""" + + id: uuid.UUID = Field(default_factory=uuid.uuid4, primary_key=True) + project_id: uuid.UUID = Field(foreign_key="project.id") + # Only the SHA-256 hash of the token is stored; the raw token lives in the + # invite URL and is shown to the creator once. + token_hash: str = Field(unique=True, index=True) + role_id: int = Field(ge=min(ROLE_IDS.values()), le=max(ROLE_IDS.values())) + created_by_user_id: uuid.UUID | None = Field( + default=None, foreign_key="user.id" + ) + created: datetime = Field(default_factory=utcnow) + expires: datetime | None = Field(default=None) + max_uses: int | None = Field(default=None) + use_count: int = Field(default=0) + revoked: bool = Field(default=False) + # Relationships + project: Project = Relationship(back_populates="invitations") + + @computed_field + @property + def role_name(self) -> str: + return ROLE_NAMES[self.role_id] + + @property + def is_valid(self) -> bool: + if self.revoked: + return False + if self.expires is not None and self.expires < utcnow(): + return False + if self.max_uses is not None and self.use_count >= self.max_uses: + return False + return True + + +class ProjectInvitationPost(SQLModel): + role: Literal["read", "write", "admin"] = "write" + expires_days: int | None = Field(default=None, ge=1, le=365) + max_uses: int | None = Field(default=None, ge=1) + + +class ProjectInvitationPublic(SQLModel): + id: uuid.UUID + role_name: str + created: datetime + expires: datetime | None + max_uses: int | None + use_count: int + revoked: bool + + +class ProjectInvitationCreated(ProjectInvitationPublic): + # Raw token + ready-to-share URL, returned only at creation time. + token: str + url: str + + +class ProjectInvitationRedeemed(SQLModel): + owner_name: str + project_name: str + role_name: str + + class DvcPipelineStage(SQLModel): cmd: str deps: list[str] | None = None @@ -750,7 +862,7 @@ class ProjectComment(SQLModel, table=True): @computed_field @property - def user_github_username(self) -> str: + def user_github_username(self) -> str | None: return self.user.github_username @computed_field @@ -887,7 +999,7 @@ class FileLock(SQLModel, table=True): @computed_field @property - def user_github_username(self) -> str: + def user_github_username(self) -> str | None: return self.user.github_username @computed_field diff --git a/backend/app/projects.py b/backend/app/projects.py index 286e9792..ab5d6966 100644 --- a/backend/app/projects.py +++ b/backend/app/projects.py @@ -43,6 +43,7 @@ def _yaml_load(data: bytes | str): Notebook, Org, Project, + ProjectMembership, Publication, User, UserProjectAccess, @@ -77,6 +78,64 @@ def _yaml_load(data: bytes | str): ) +def _resolve_github_collaborator_access( + session: Session, project: Project, current_user: User +) -> None: + """Resolve a non-member user's access from the cached GitHub permission, + querying GitHub and caching the result on a miss. Sets + ``project.current_user_access`` (left None if it can't be determined). + """ + # TODO: There may be a race here with concurrent requests, though it does + # not appear to cause a real problem despite the failed writes. + access_query = ( + select(UserProjectAccess) + .where(UserProjectAccess.project_id == project.id) + .where(UserProjectAccess.user_id == current_user.id) + .with_for_update() + ) + access = session.exec(access_query).first() + if access is not None: + project.current_user_access = access.access + return + # Query GitHub for permissions + try: + github_token = app.users.get_github_token(session, current_user) + except HTTPException: + github_token = None + logger.info(f"User {current_user.email} has no GitHub token") + if github_token is None: + return + logger.info("Fetching permissions from GitHub") + url = ( + f"https://api.github.com/repos/{project.github_repo}" + f"/collaborators/{current_user.github_username}/permission" + ) + resp = requests.get( + url, + headers={"Authorization": f"Bearer {github_token}"}, + timeout=15, + ) + if resp.status_code == 200: + logger.info("Fetched permissions from GitHub") + permissions = resp.json()["permission"] + if permissions == "none": + permissions = None + else: + permissions = None + logger.info( + f"Failed to fetch permissions from GitHub ({resp.status_code})" + ) + project.current_user_access = permissions + session.add( + UserProjectAccess( + project_id=project.id, + user_id=current_user.id, + access=permissions, + ) + ) + session.commit() + + def get_project( session: Session, owner_name: str, @@ -127,64 +186,20 @@ def get_project( if project.current_user_access is None and project.is_public: project.current_user_access = "read" else: - # Query for permissions in our database, and if they aren't set, - # query GitHub and save - # TODO: We seem to have a race condition here with multiple - # requests causing this to run concurrently, though it doesn't - # seem to actually cause a problem despite the failure to write - # to the database in all but one - access_query = ( - select(UserProjectAccess) - .where(UserProjectAccess.project_id == project.id) - .where(UserProjectAccess.user_id == current_user.id) - .with_for_update() - ) - access = session.exec(access_query).first() - if access is not None: - project.current_user_access = access.access + # Non-owner: native Calkit membership takes precedence over + # GitHub-derived access, and is the only access path for + # GitHub-less collaborators. + membership = session.exec( + select(ProjectMembership) + .where(ProjectMembership.project_id == project.id) + .where(ProjectMembership.user_id == current_user.id) + ).first() + if membership is not None: + project.current_user_access = membership.role_name else: - # Query GitHub for permissions - try: - github_token = app.users.get_github_token( - session, current_user - ) - except HTTPException: - github_token = None - logger.info( - f"User {current_user.email} has no GitHub token" - ) - if github_token is not None: - logger.info("Fetching permissions from GitHub") - url = ( - f"https://api.github.com/repos/{project.github_repo}" - f"/collaborators/{current_user.github_username}/" - "permission" - ) - resp = requests.get( - url, - headers={"Authorization": f"Bearer {github_token}"}, - timeout=15, - ) - if resp.status_code == 200: - logger.info("Fetched permissions from GitHub") - permissions = resp.json()["permission"] - if permissions == "none": - permissions = None - else: - permissions = None - logger.info( - "Failed to fetch permissions from GitHub " - f"({resp.status_code})" - ) - project.current_user_access = permissions - session.add( - UserProjectAccess( - project_id=project.id, - user_id=current_user.id, - access=permissions, - ) - ) - session.commit() + _resolve_github_collaborator_access( + session, project, current_user + ) if project.is_public and project.current_user_access is None: project.current_user_access = "read" if project.current_user_access is None: diff --git a/backend/app/tests/api/routes/projects/test_core.py b/backend/app/tests/api/routes/projects/test_core.py index a374f38c..201d88f9 100644 --- a/backend/app/tests/api/routes/projects/test_core.py +++ b/backend/app/tests/api/routes/projects/test_core.py @@ -1,12 +1,17 @@ """Tests for app.api.routes.projects.core endpoints.""" +import uuid from types import SimpleNamespace from unittest.mock import ANY, patch +from app import users from app.api.routes.projects.core import get_project_comments from app.config import settings -from app.models.core import ContentsItem +from app.models import Project, UserCreate +from app.models.core import ContentsItem, ProjectMembership +from app.tests import authentication_token_from_email, create_random_user from fastapi.testclient import TestClient +from sqlmodel import Session, select def test_get_project_contents_forwards_ref(client: TestClient) -> None: @@ -755,3 +760,136 @@ def test_get_project_presentations_reads_declared_at_ref( _ref_aware_endpoint_reads_declared_at_ref( client, "presentations", "presentations" ) + + +# --- Project membership & invitation links (I2) --------------------------- + + +def _make_owner_with_project( + db: Session, client: TestClient +) -> tuple[Project, dict[str, str]]: + """Create a project owner (with GitHub) + a private project, return the + project and the owner's auth headers. + """ + suffix = uuid.uuid4().hex[:8] + owner = users.create_user( + session=db, + user_create=UserCreate( + email=f"owner-{suffix}@example.com", + password="ownerpassword123", + account_name=f"owner{suffix}", + github_username=f"owner{suffix}", + ), + ) + project = Project( + name=f"proj-{suffix}", + title="Invite Test Project", + git_repo_url=f"https://github.com/owner{suffix}/proj-{suffix}", + owner_account_id=owner.account.id, + owner_account=owner.account, + ) + db.add(project) + db.commit() + db.refresh(project) + headers = authentication_token_from_email( + client=client, email=owner.email, db=db + ) + return project, headers + + +def test_invitation_create_and_redeem_grants_access( + client: TestClient, db: Session +) -> None: + project, owner_headers = _make_owner_with_project(db, client) + owner_name = project.owner_account.name + base = f"{settings.API_V1_STR}/projects/{owner_name}/{project.name}" + + # A GitHub-less user has no access to the private project yet. + ghless = create_random_user(db) + assert ghless.account.github_name is None + ghless_headers = authentication_token_from_email( + client=client, email=ghless.email, db=db + ) + r = client.get(base, headers=ghless_headers) + assert r.status_code == 403 + + # Owner creates an invite link. + r = client.post( + f"{base}/invitations", + headers=owner_headers, + json={"role": "write", "max_uses": 5}, + ) + assert r.status_code == 200, r.text + invite = r.json() + assert invite["role_name"] == "write" + assert invite["token"] + assert f"/join/{invite['token']}" in invite["url"] + token = invite["token"] + + # The GitHub-less user redeems it and gains write membership. + r = client.post( + f"{settings.API_V1_STR}/project-invitations/{token}", + headers=ghless_headers, + ) + assert r.status_code == 200, r.text + redeemed = r.json() + assert redeemed["owner_name"] == owner_name + assert redeemed["project_name"] == project.name + assert redeemed["role_name"] == "write" + + # Membership row exists and the user can now read the project. + membership = db.exec( + select(ProjectMembership) + .where(ProjectMembership.project_id == project.id) + .where(ProjectMembership.user_id == ghless.id) + ).first() + assert membership is not None and membership.role_name == "write" + r = client.get(base, headers=ghless_headers) + assert r.status_code == 200 + + +def test_invitation_create_requires_admin( + client: TestClient, db: Session +) -> None: + project, _ = _make_owner_with_project(db, client) + owner_name = project.owner_account.name + # A random non-member cannot create invitations. + other = create_random_user(db) + other_headers = authentication_token_from_email( + client=client, email=other.email, db=db + ) + r = client.post( + f"{settings.API_V1_STR}/projects/{owner_name}/{project.name}" + "/invitations", + headers=other_headers, + json={"role": "write"}, + ) + assert r.status_code == 403 + + +def test_redeem_revoked_invitation_fails( + client: TestClient, db: Session +) -> None: + project, owner_headers = _make_owner_with_project(db, client) + owner_name = project.owner_account.name + base = f"{settings.API_V1_STR}/projects/{owner_name}/{project.name}" + r = client.post( + f"{base}/invitations", headers=owner_headers, json={"role": "read"} + ) + assert r.status_code == 200 + invite = r.json() + # Revoke it. + r = client.delete( + f"{base}/invitations/{invite['id']}", headers=owner_headers + ) + assert r.status_code == 200 + # Redeeming a revoked invite is rejected. + redeemer = create_random_user(db) + redeemer_headers = authentication_token_from_email( + client=client, email=redeemer.email, db=db + ) + r = client.post( + f"{settings.API_V1_STR}/project-invitations/{invite['token']}", + headers=redeemer_headers, + ) + assert r.status_code == 410 diff --git a/backend/app/tests/api/routes/test_users.py b/backend/app/tests/api/routes/test_users.py index e3f66e9a..e3a77931 100644 --- a/backend/app/tests/api/routes/test_users.py +++ b/backend/app/tests/api/routes/test_users.py @@ -286,15 +286,24 @@ def test_update_password_me_same_password_error( def test_register_user(client: TestClient, db: Session) -> None: - username = random_email() + """A user can self-register with email + password and no GitHub account.""" + email = random_email() password = random_lower_string() full_name = random_lower_string() - data = {"email": username, "password": password, "full_name": full_name} + data = {"email": email, "password": password, "full_name": full_name} r = client.post( f"{settings.API_V1_STR}/users/signup", json=data, ) - assert r.status_code == 501 + assert 200 <= r.status_code < 300 + created = r.json() + assert created["email"] == email + # GitHub-less signup: no GitHub username on the public payload + assert created["github_username"] is None + user = users.get_user_by_email(session=db, email=email) + assert user is not None + assert user.account.github_name is None + assert verify_password(password, user.hashed_password) def test_register_user_already_exists_error(client: TestClient) -> None: @@ -309,7 +318,31 @@ def test_register_user_already_exists_error(client: TestClient) -> None: f"{settings.API_V1_STR}/users/signup", json=data, ) - assert r.status_code == 501 + assert r.status_code == 400 + + +def test_github_less_user_cannot_create_project(client: TestClient) -> None: + """GitHub-less users can sign up but cannot own projects (yet).""" + email = random_email() + password = random_lower_string() + r = client.post( + f"{settings.API_V1_STR}/users/signup", + json={"email": email, "password": password}, + ) + assert 200 <= r.status_code < 300 + login = client.post( + f"{settings.API_V1_STR}/login/access-token", + data={"username": email, "password": password}, + ) + assert login.status_code == 200 + headers = {"Authorization": f"Bearer {login.json()['access_token']}"} + r = client.post( + f"{settings.API_V1_STR}/projects", + headers=headers, + json={"name": "ghless-project", "title": "GitHub-less project"}, + ) + assert r.status_code == 403 + assert "GitHub" in r.json()["detail"] def test_update_user( diff --git a/backend/app/tests/test_git.py b/backend/app/tests/test_git.py index f213f80d..e8f4a50e 100644 --- a/backend/app/tests/test_git.py +++ b/backend/app/tests/test_git.py @@ -1,13 +1,23 @@ """Tests for app.git.""" -import uuid from pathlib import Path import git +import pytest +from fastapi import HTTPException import app.git +import app.github import app.projects -from app.models import Account, Project + + +class _FakeResp: + def __init__(self, status_code: int, payload: dict) -> None: + self.status_code = status_code + self._payload = payload + + def json(self) -> dict: + return self._payload def _init_repo(repo_dir: Path) -> tuple[git.Repo, str]: @@ -125,3 +135,42 @@ def test_get_file_history_dvc_lock(tmp_path, monkeypatch): assert len(history) == 2 # Newest first assert history[0]["committed_date"] >= history[-1]["committed_date"] + + +def test_get_app_installation_token(monkeypatch) -> None: + """The App JWT is exchanged for a repo-scoped installation token.""" + calls: dict = {} + monkeypatch.setattr(app.github, "create_app_token", lambda: "fake-jwt") + + def fake_get(url, headers=None, timeout=None): + calls["get_url"] = url + calls["get_auth"] = headers["Authorization"] + return _FakeResp(200, {"id": 12345}) + + def fake_post(url, headers=None, json=None, timeout=None): + calls["post_url"] = url + calls["post_json"] = json + return _FakeResp(201, {"token": "ghs_installationtoken"}) + + monkeypatch.setattr(app.github.requests, "get", fake_get) + monkeypatch.setattr(app.github.requests, "post", fake_post) + + token = app.github.get_app_installation_token("owner-acct", "my-repo") + assert token == "ghs_installationtoken" + assert calls["get_url"].endswith("/repos/owner-acct/my-repo/installation") + assert calls["get_auth"] == "Bearer fake-jwt" + assert "/app/installations/12345/access_tokens" in calls["post_url"] + assert calls["post_json"] == {"repositories": ["my-repo"]} + + +def test_get_app_installation_token_no_installation(monkeypatch) -> None: + """A missing installation surfaces as a 502, not a crash.""" + monkeypatch.setattr(app.github, "create_app_token", lambda: "fake-jwt") + monkeypatch.setattr( + app.github.requests, + "get", + lambda *a, **k: _FakeResp(404, {}), + ) + with pytest.raises(HTTPException) as exc: + app.github.get_app_installation_token("owner", "repo") + assert exc.value.status_code == 502 diff --git a/backend/app/users.py b/backend/app/users.py index f0bfd017..d56fc407 100644 --- a/backend/app/users.py +++ b/backend/app/users.py @@ -120,7 +120,9 @@ def create_user(*, session: Session, user_create: UserCreate) -> User: account_name = user_create.account_name or user_create.github_username if not account_name: account_name = user_create.email.split("@")[0] - github_name = user_create.github_username or account_name + # Only set a GitHub name when the user actually has a GitHub account; + # GitHub-less (email/Google) signups leave it null. + github_name = user_create.github_username if account_name.lower() in INVALID_ACCOUNT_NAMES: raise HTTPException(422, "Invalid account name") existing = session.exec( diff --git a/frontend/src/client/schemas.gen.ts b/frontend/src/client/schemas.gen.ts index 59686efb..82ec377d 100644 --- a/frontend/src/client/schemas.gen.ts +++ b/frontend/src/client/schemas.gen.ts @@ -7,7 +7,14 @@ export const AccountPublicSchema = { title: "Name", }, github_name: { - type: "string", + anyOf: [ + { + type: "string", + }, + { + type: "null", + }, + ], title: "Github Name", }, display_name: { @@ -1577,7 +1584,14 @@ export const FileLockSchema = { title: "User Id", }, user_github_username: { - type: "string", + anyOf: [ + { + type: "string", + }, + { + type: "null", + }, + ], title: "User Github Username", readOnly: true, }, @@ -3599,7 +3613,14 @@ export const ProjectCommentSchema = { title: "Git Rev", }, user_github_username: { - type: "string", + anyOf: [ + { + type: "string", + }, + { + type: "null", + }, + ], title: "User Github Username", readOnly: true, }, @@ -3730,6 +3751,196 @@ export const ProjectCommentPostSchema = { title: "ProjectCommentPost", } as const +export const ProjectInvitationCreatedSchema = { + properties: { + id: { + type: "string", + format: "uuid", + title: "Id", + }, + role_name: { + type: "string", + title: "Role Name", + }, + created: { + type: "string", + format: "date-time", + title: "Created", + }, + expires: { + anyOf: [ + { + type: "string", + format: "date-time", + }, + { + type: "null", + }, + ], + title: "Expires", + }, + max_uses: { + anyOf: [ + { + type: "integer", + }, + { + type: "null", + }, + ], + title: "Max Uses", + }, + use_count: { + type: "integer", + title: "Use Count", + }, + revoked: { + type: "boolean", + title: "Revoked", + }, + token: { + type: "string", + title: "Token", + }, + url: { + type: "string", + title: "Url", + }, + }, + type: "object", + required: [ + "id", + "role_name", + "created", + "expires", + "max_uses", + "use_count", + "revoked", + "token", + "url", + ], + title: "ProjectInvitationCreated", +} as const + +export const ProjectInvitationPostSchema = { + properties: { + role: { + type: "string", + enum: ["read", "write", "admin"], + title: "Role", + default: "write", + }, + expires_days: { + anyOf: [ + { + type: "integer", + maximum: 365, + minimum: 1, + }, + { + type: "null", + }, + ], + title: "Expires Days", + }, + max_uses: { + anyOf: [ + { + type: "integer", + minimum: 1, + }, + { + type: "null", + }, + ], + title: "Max Uses", + }, + }, + type: "object", + title: "ProjectInvitationPost", +} as const + +export const ProjectInvitationPublicSchema = { + properties: { + id: { + type: "string", + format: "uuid", + title: "Id", + }, + role_name: { + type: "string", + title: "Role Name", + }, + created: { + type: "string", + format: "date-time", + title: "Created", + }, + expires: { + anyOf: [ + { + type: "string", + format: "date-time", + }, + { + type: "null", + }, + ], + title: "Expires", + }, + max_uses: { + anyOf: [ + { + type: "integer", + }, + { + type: "null", + }, + ], + title: "Max Uses", + }, + use_count: { + type: "integer", + title: "Use Count", + }, + revoked: { + type: "boolean", + title: "Revoked", + }, + }, + type: "object", + required: [ + "id", + "role_name", + "created", + "expires", + "max_uses", + "use_count", + "revoked", + ], + title: "ProjectInvitationPublic", +} as const + +export const ProjectInvitationRedeemedSchema = { + properties: { + owner_name: { + type: "string", + title: "Owner Name", + }, + project_name: { + type: "string", + title: "Project Name", + }, + role_name: { + type: "string", + title: "Role Name", + }, + }, + type: "object", + required: ["owner_name", "project_name", "role_name"], + title: "ProjectInvitationRedeemed", +} as const + export const ProjectOptionalExtendedSchema = { properties: { name: { @@ -5506,8 +5717,15 @@ export const UserCreateSchema = { title: "Account Name", }, github_username: { - type: "string", - maxLength: 64, + anyOf: [ + { + type: "string", + maxLength: 64, + }, + { + type: "null", + }, + ], title: "Github Username", }, }, @@ -5552,7 +5770,14 @@ export const UserPublicSchema = { title: "Id", }, github_username: { - type: "string", + anyOf: [ + { + type: "string", + }, + { + type: "null", + }, + ], title: "Github Username", }, subscription: { diff --git a/frontend/src/client/sdk.gen.ts b/frontend/src/client/sdk.gen.ts index 8c0a8e49..ac017d65 100644 --- a/frontend/src/client/sdk.gen.ts +++ b/frontend/src/client/sdk.gen.ts @@ -143,6 +143,14 @@ import type { PutProjectCollaboratorResponse, DeleteProjectCollaboratorData, DeleteProjectCollaboratorResponse, + PostProjectInvitationData, + PostProjectInvitationResponse, + GetProjectInvitationsData, + GetProjectInvitationsResponse, + DeleteProjectInvitationData, + DeleteProjectInvitationResponse, + PostProjectInvitationRedemptionData, + PostProjectInvitationRedemptionResponse, GetProjectIssuesData, GetProjectIssuesResponse, PostProjectIssueData, @@ -2168,6 +2176,110 @@ export class ProjectsService { }) } + /** + * Post Project Invitation + * Create a shareable invite link granting native project membership. + * + * The raw token is returned only here; the DB stores its hash. Invites can + * grant up to admin, never ownership. + * @param data The data for the request. + * @param data.ownerName + * @param data.projectName + * @param data.requestBody + * @returns ProjectInvitationCreated Successful Response + * @throws ApiError + */ + public static postProjectInvitation( + data: PostProjectInvitationData, + ): CancelablePromise { + return __request(OpenAPI, { + method: "POST", + url: "/projects/{owner_name}/{project_name}/invitations", + path: { + owner_name: data.ownerName, + project_name: data.projectName, + }, + body: data.requestBody, + mediaType: "application/json", + errors: { + 422: "Validation Error", + }, + }) + } + + /** + * Get Project Invitations + * @param data The data for the request. + * @param data.ownerName + * @param data.projectName + * @returns ProjectInvitationPublic Successful Response + * @throws ApiError + */ + public static getProjectInvitations( + data: GetProjectInvitationsData, + ): CancelablePromise { + return __request(OpenAPI, { + method: "GET", + url: "/projects/{owner_name}/{project_name}/invitations", + path: { + owner_name: data.ownerName, + project_name: data.projectName, + }, + errors: { + 422: "Validation Error", + }, + }) + } + + /** + * Delete Project Invitation + * @param data The data for the request. + * @param data.ownerName + * @param data.projectName + * @param data.invitationId + * @returns Message Successful Response + * @throws ApiError + */ + public static deleteProjectInvitation( + data: DeleteProjectInvitationData, + ): CancelablePromise { + return __request(OpenAPI, { + method: "DELETE", + url: "/projects/{owner_name}/{project_name}/invitations/{invitation_id}", + path: { + owner_name: data.ownerName, + project_name: data.projectName, + invitation_id: data.invitationId, + }, + errors: { + 422: "Validation Error", + }, + }) + } + + /** + * Post Project Invitation Redemption + * Redeem an invite link, granting the current user native membership. + * @param data The data for the request. + * @param data.token + * @returns ProjectInvitationRedeemed Successful Response + * @throws ApiError + */ + public static postProjectInvitationRedemption( + data: PostProjectInvitationRedemptionData, + ): CancelablePromise { + return __request(OpenAPI, { + method: "POST", + url: "/project-invitations/{token}", + path: { + token: data.token, + }, + errors: { + 422: "Validation Error", + }, + }) + } + /** * Get Project Issues * @param data The data for the request. @@ -2913,7 +3025,10 @@ export class UsersService { /** * Register User - * Create new user without the need to be logged in. + * Create a new user with email + password, without a GitHub account. + * + * Such users can collaborate on projects (e.g. via invite links) but cannot + * own projects until git hosting is decoupled from GitHub. * @param data The data for the request. * @param data.requestBody * @returns UserPublic Successful Response diff --git a/frontend/src/client/types.gen.ts b/frontend/src/client/types.gen.ts index 674ca6b5..6a4d082c 100644 --- a/frontend/src/client/types.gen.ts +++ b/frontend/src/client/types.gen.ts @@ -17,7 +17,7 @@ export type _ContentsItemBase = { export type AccountPublic = { name: string - github_name: string + github_name: string | null display_name: string kind: "user" | "org" role?: "self" | "read" | "write" | "admin" | "owner" | null @@ -362,7 +362,7 @@ export type FileLock = { path: string created?: string user_id: string - readonly user_github_username: string + readonly user_github_username: string | null readonly user_email: string } @@ -809,7 +809,7 @@ export type ProjectComment = { resolved?: string | null git_ref?: string | null git_rev?: string | null - readonly user_github_username: string + readonly user_github_username: string | null readonly user_full_name: string | null readonly user_email: string } @@ -828,6 +828,42 @@ export type ProjectCommentPost = { git_ref?: string | null } +export type ProjectInvitationCreated = { + id: string + role_name: string + created: string + expires: string | null + max_uses: number | null + use_count: number + revoked: boolean + token: string + url: string +} + +export type ProjectInvitationPost = { + role?: "read" | "write" | "admin" + expires_days?: number | null + max_uses?: number | null +} + +export type role2 = "read" | "write" | "admin" + +export type ProjectInvitationPublic = { + id: string + role_name: string + created: string + expires: string | null + max_uses: number | null + use_count: number + revoked: boolean +} + +export type ProjectInvitationRedeemed = { + owner_name: string + project_name: string + role_name: string +} + export type ProjectOptionalExtended = { name: string title: string @@ -1155,7 +1191,7 @@ export type UserCreate = { full_name?: string | null password: string account_name?: string | null - github_username?: string + github_username?: string | null } export type UserPublic = { @@ -1164,7 +1200,7 @@ export type UserPublic = { is_superuser?: boolean full_name?: string | null id: string - github_username: string + github_username: string | null subscription: UserSubscription | null } @@ -1786,6 +1822,35 @@ export type DeleteProjectCollaboratorData = { export type DeleteProjectCollaboratorResponse = Message +export type PostProjectInvitationData = { + ownerName: string + projectName: string + requestBody: ProjectInvitationPost +} + +export type PostProjectInvitationResponse = ProjectInvitationCreated + +export type GetProjectInvitationsData = { + ownerName: string + projectName: string +} + +export type GetProjectInvitationsResponse = Array + +export type DeleteProjectInvitationData = { + invitationId: string + ownerName: string + projectName: string +} + +export type DeleteProjectInvitationResponse = Message + +export type PostProjectInvitationRedemptionData = { + token: string +} + +export type PostProjectInvitationRedemptionResponse = ProjectInvitationRedeemed + export type GetProjectIssuesData = { ownerName: string page?: number diff --git a/frontend/src/components/Common/ArtifactCompareModal.tsx b/frontend/src/components/Common/ArtifactCompareModal.tsx index 544f8238..f059043e 100644 --- a/frontend/src/components/Common/ArtifactCompareModal.tsx +++ b/frontend/src/components/Common/ArtifactCompareModal.tsx @@ -496,7 +496,7 @@ function FigureComments({ return ( = invite.max_uses) { + return { label: "Used up", color: "gray" } + } + return { label: "Active", color: "green" } +} + +const CreateInviteModal = ({ + ownerName, + projectName, + isOpen, + onClose, + onCreated, +}: InviteLinksProps & { + isOpen: boolean + onClose: () => void + onCreated: (invite: ProjectInvitationCreated) => void +}) => { + const queryClient = useQueryClient() + const showToast = useCustomToast() + const { + register, + handleSubmit, + reset, + formState: { errors, isSubmitting }, + } = useForm({ + mode: "onBlur", + defaultValues: { role: "write", expires_days: "", max_uses: "" }, + }) + + const mutation = useMutation({ + mutationFn: (data: CreateInviteForm) => { + const requestBody: ProjectInvitationPost = { + role: data.role, + expires_days: data.expires_days ? Number(data.expires_days) : null, + max_uses: data.max_uses ? Number(data.max_uses) : null, + } + return ProjectsService.postProjectInvitation({ + ownerName, + projectName, + requestBody, + }) + }, + onSuccess: (invite) => { + showToast("Success!", "Invite link created.", "success") + reset() + onClose() + onCreated(invite) + }, + onError: (err: ApiError) => { + handleError(err, showToast) + }, + onSettled: () => { + queryClient.invalidateQueries({ + queryKey: ["projects", ownerName, projectName, "invitations"], + }) + }, + }) + + const onSubmit: SubmitHandler = (data) => { + mutation.mutate(data) + } + + return ( + + + + Create invite link + + + + Access level + + + + Expires in (days) + + + + Max uses + + + + + + + + + + ) +} + +const InviteLinks = ({ ownerName, projectName }: InviteLinksProps) => { + const queryClient = useQueryClient() + const showToast = useCustomToast() + const createModal = useDisclosure() + const [created, setCreated] = useState(null) + const { isPending, data: invitations } = useQuery({ + queryKey: ["projects", ownerName, projectName, "invitations"], + queryFn: () => + ProjectsService.getProjectInvitations({ ownerName, projectName }), + }) + + const revokeMutation = useMutation({ + mutationFn: (invitationId: string) => + ProjectsService.deleteProjectInvitation({ + ownerName, + projectName, + invitationId, + }), + onSuccess: () => { + showToast("Success!", "Invite link revoked.", "success") + }, + onError: (err: ApiError) => { + handleError(err, showToast) + }, + onSettled: () => { + queryClient.invalidateQueries({ + queryKey: ["projects", ownerName, projectName, "invitations"], + }) + }, + }) + + const copyLink = (url: string) => { + navigator.clipboard.writeText(url) + showToast("Copied", "Invite link copied to clipboard.", "success") + } + + return ( + + + Invite links + + + + Share a link to let people join this project — including collaborators + without a GitHub account. + + {created && ( + + + New invite link (copy it now — it won't be shown again): + + + + {created.url} + + } + onClick={() => copyLink(created.url)} + /> + + + )} + + + + + + + + + + + + {isPending ? ( + + + {new Array(5).fill(null).map((_, index) => ( + + ))} + + + ) : ( + + {invitations?.length ? ( + invitations.map((invite) => { + const status = invitationStatus(invite) + return ( + + + + + + + + ) + }) + ) : ( + + + + )} + + )} +
AccessStatusUsesExpiresActions
+ +
{invite.role_name} + {status.label} + + {invite.use_count} + {invite.max_uses !== null + ? ` / ${invite.max_uses}` + : ""} + + {invite.expires + ? new Date(invite.expires).toLocaleDateString() + : "Never"} + + } + variant="ghost" + color="ui.danger" + isDisabled={invite.revoked} + isLoading={ + revokeMutation.isPending && + revokeMutation.variables === invite.id + } + onClick={() => revokeMutation.mutate(invite.id)} + /> +
+ No invite links yet. +
+
+ +
+ ) +} + +export default InviteLinks diff --git a/frontend/src/routes/_layout/$accountName/$projectName/_layout/collaborators.tsx b/frontend/src/routes/_layout/$accountName/$projectName/_layout/collaborators.tsx index e1d62f67..ce56c40c 100644 --- a/frontend/src/routes/_layout/$accountName/$projectName/_layout/collaborators.tsx +++ b/frontend/src/routes/_layout/$accountName/$projectName/_layout/collaborators.tsx @@ -24,6 +24,7 @@ import { useQuery } from "@tanstack/react-query" import Navbar from "../../../../../components/Common/Navbar" import AddCollaborator from "../../../../../components/Projects/AddCollaborator" +import InviteLinks from "../../../../../components/Projects/InviteLinks" import { ProjectsService } from "../../../../../client" import useAuth from "../../../../../hooks/useAuth" import Delete from "../../../../../components/Common/DeleteAlert" @@ -154,6 +155,7 @@ function Collaborators() { )} + ) } diff --git a/spikes/latex-wasm-busytex/.gitignore b/spikes/latex-wasm-busytex/.gitignore new file mode 100644 index 00000000..336fa8c3 --- /dev/null +++ b/spikes/latex-wasm-busytex/.gitignore @@ -0,0 +1,7 @@ +# Vendored busytex WASM engine + TeX Live data bundles — large binaries, not committed. +# Re-fetch with ./download-assets.sh (see README.md). +vendor/ + +# Generated by run-headless.mjs +out.pdf +out.png diff --git a/spikes/latex-wasm-busytex/README.md b/spikes/latex-wasm-busytex/README.md new file mode 100644 index 00000000..1c03c20c --- /dev/null +++ b/spikes/latex-wasm-busytex/README.md @@ -0,0 +1,57 @@ +# LaTeX → PDF in WASM — compile spike (§8.1) + +Throwaway spike for `LATEX_EDITOR_PLAN.md` Phase 0: prove that LaTeX compiles to PDF +entirely in the browser, and measure cold-start + compile time, before building the editor. + +## Engine & license (Path 1, MIT-clean) + +- Engine: **upstream `busytex/busytex`** WASM build — **TeX Live 2023**, emscripten 3.1.43. +- The busytex `.js` glue (`busytex_pipeline.js`, `busytex_worker.js`) is **MIT**; the + compiled `busytex.wasm` and `texlive-*.data` bundles carry TeX Live / LPPL (permissive) + licenses. This is clean to redistribute from an MIT project. +- We deliberately do **not** use TeXlyre's TeX Live 2026 build (`texlyre-busytex`), which is + **AGPL-3.0**. See `LATEX_EDITOR_PLAN.md` §0. +- `main.js` is **our own** thin loader around the MIT worker — no TeXlyre source is used. + +## Run it + +```sh +./download-assets.sh # ~135 MB from busytex GitHub releases (needs gh, authed) +node serve.mjs # http://localhost:8099 (sets COOP/COEP) +``` + +Open http://localhost:8099 and click **Compile sample**. The left pane streams the TeX log; +the right pane renders the produced PDF; the header shows cold-start / compile / total ms. + +## What it does + +- `vendor/busytex_worker.js` (MIT) runs the engine in a Web Worker. +- `main.js` initializes the pipeline, then compiles `sample/main.tex` with the + `pdftex_bibtex8` driver against the `texlive-basic` bundle. +- PDF bytes come back as a `Uint8Array` and are shown via a blob URL in an ` + + + + diff --git a/spikes/latex-wasm-busytex/main.js b/spikes/latex-wasm-busytex/main.js new file mode 100644 index 00000000..4c4dc075 --- /dev/null +++ b/spikes/latex-wasm-busytex/main.js @@ -0,0 +1,106 @@ +// Spike orchestrator: load the MIT busytex worker, compile sample/main.tex to PDF +// entirely in the browser, render it, and report timings. This is OUR OWN thin loader +// (Path 1) around the busytex MIT worker/pipeline glue — no TeXlyre code involved. + +const logEl = document.getElementById('log'); +const statusEl = document.getElementById('status'); +const metricsEl = document.getElementById('metrics'); +const frame = document.getElementById('pdf'); +const runBtn = document.getElementById('run'); + +const now = () => performance.now(); +const ms = (a, b) => `${Math.round(b - a)} ms`; + +function log(line) { + logEl.textContent += line + '\n'; + logEl.scrollTop = logEl.scrollHeight; +} +function setStatus(s) { statusEl.textContent = s; } + +// busytex driver options: pdftex_bibtex8 | xetex_bibtex8_dvipdfmx | luahbtex_bibtex8 | luatex_bibtex8 +const DRIVER = 'pdftex_bibtex8'; +const DATA_PACKAGES = ['texlive-basic.js']; // base TeX Live 2023 filesystem + +let worker; +let tWorkerStart, tInitialized, tCompileStart; + +async function main() { + runBtn.disabled = true; + logEl.textContent = ''; + metricsEl.textContent = ''; + frame.removeAttribute('src'); + setStatus('loading engine…'); + + const texSource = await (await fetch('sample/main.tex')).text(); + + tWorkerStart = now(); + // Worker lives in vendor/ so its importScripts('busytex_pipeline.js') and the + // bare asset filenames below all resolve relative to vendor/. + worker = new Worker('vendor/busytex_worker.js'); + + worker.onmessage = ({ data }) => { + if (data.print !== undefined) { log(data.print); return; } + + if (data.initialized !== undefined) { + tInitialized = now(); + setStatus('engine ready — compiling…'); + log(`\n=== engine initialized in ${ms(tWorkerStart, tInitialized)} ===`); + log('applet versions: ' + JSON.stringify(data.initialized) + '\n'); + tCompileStart = now(); + worker.postMessage({ + files: [{ path: 'main.tex', contents: texSource }], + main_tex_path: 'main.tex', + bibtex: null, // auto-detect; sample has no bibliography -> single pdflatex pass + verbose: 'silent', + driver: DRIVER, + data_packages_js: DATA_PACKAGES, + }); + return; + } + + if (data.exception !== undefined) { + setStatus('FAILED'); + log('\n!!! EXCEPTION:\n' + data.exception); + runBtn.disabled = false; + return; + } + + // Otherwise: the compile result {pdf, log, exit_code, logs} + const tDone = now(); + const ok = data.exit_code === 0 && data.pdf; + setStatus(ok ? 'compiled ✓' : `compile failed (exit ${data.exit_code})`); + metricsEl.innerHTML = [ + `engine cold-start: ${ms(tWorkerStart, tInitialized)}`, + `compile: ${ms(tCompileStart, tDone)}`, + `total: ${ms(tWorkerStart, tDone)}`, + `pdf size: ${data.pdf ? (data.pdf.byteLength / 1024).toFixed(1) + ' KB' : 'none'}`, + ].join('  |  '); + + if (ok) { + const blob = new Blob([data.pdf], { type: 'application/pdf' }); + frame.src = URL.createObjectURL(blob); + } else { + log('\n=== compile log ===\n' + (data.log || '(no log)')); + } + runBtn.disabled = false; + worker.terminate(); + }; + + worker.onerror = (e) => { + setStatus('worker error'); + log(`\n!!! worker error: ${e.message} @ ${e.filename}:${e.lineno}`); + runBtn.disabled = false; + }; + + // Initialize the pipeline (paths are relative to the worker's vendor/ dir). + worker.postMessage({ + busytex_wasm: 'busytex.wasm', + busytex_js: 'busytex.js', + preload_data_packages_js: DATA_PACKAGES, + data_packages_js: DATA_PACKAGES, + texmf_local: [], + preload: true, + }); +} + +runBtn.addEventListener('click', main); diff --git a/spikes/latex-wasm-busytex/run-headless.mjs b/spikes/latex-wasm-busytex/run-headless.mjs new file mode 100644 index 00000000..0769c6a0 --- /dev/null +++ b/spikes/latex-wasm-busytex/run-headless.mjs @@ -0,0 +1,51 @@ +// Headless driver for the spike: loads the page, clicks Compile, waits for the +// result, prints metrics + a PDF artifact. Uses the frontend's playwright + system Chrome. +import pw from '/Users/pete/dev/calkit-cloud/frontend/node_modules/playwright-core/index.js'; +import { writeFileSync } from 'node:fs'; +const { chromium } = pw; + +const PAGE_URL = process.env.URL || 'http://localhost:8099/'; +const browser = await chromium.launch({ channel: 'chrome', headless: true }); +const page = await browser.newPage(); +page.on('console', (m) => console.log(' [page]', m.text())); +page.on('pageerror', (e) => console.log(' [pageerror]', e.message)); + +await page.goto(PAGE_URL, { waitUntil: 'load' }); + +// Capture the compiled PDF bytes by hooking Blob URL creation isn't trivial; instead +// re-expose the last result from the worker via a global the page sets. +await page.evaluate(() => { window.__lastPdfLen = 0; }); + +await page.click('#run'); + +const deadline = Date.now() + 240_000; +let status = ''; +while (Date.now() < deadline) { + status = await page.textContent('#status'); + if (/compiled|failed|FAILED|error/.test(status)) break; + await page.waitForTimeout(500); +} + +const metrics = (await page.textContent('#metrics'))?.replace(/ /g, ' ').replace(/\s+/g, ' ').trim(); +console.log('\nSTATUS :', status); +console.log('METRICS:', metrics || '(none)'); + +// Pull the PDF from the iframe blob URL into the page and out to Node. +const pdfB64 = await page.evaluate(async () => { + const src = document.getElementById('pdf').getAttribute('src'); + if (!src) return null; + const buf = await (await fetch(src)).arrayBuffer(); + let bin = ''; const bytes = new Uint8Array(buf); + for (let i = 0; i < bytes.length; i++) bin += String.fromCharCode(bytes[i]); + return btoa(bin); +}); +if (pdfB64) { + writeFileSync('out.pdf', Buffer.from(pdfB64, 'base64')); + console.log('PDF : wrote out.pdf (' + (pdfB64.length * 0.75 / 1024).toFixed(1) + ' KB)'); +} +await page.screenshot({ path: 'out.png', fullPage: false }); + +await browser.close(); +const ok = /compiled/.test(status) && !!pdfB64; +console.log('\nRESULT :', ok ? 'PASS ✓' : 'FAIL ✗'); +process.exit(ok ? 0 : 1); diff --git a/spikes/latex-wasm-busytex/sample/main.tex b/spikes/latex-wasm-busytex/sample/main.tex new file mode 100644 index 00000000..e8f0bce9 --- /dev/null +++ b/spikes/latex-wasm-busytex/sample/main.tex @@ -0,0 +1,25 @@ +\documentclass{article} +\usepackage{amsmath} +\usepackage{graphicx} +\usepackage{hyperref} + +\title{Calkit LaTeX Editor --- WASM Compile Spike} +\author{busytex (TeX Live 2023) in the browser} +\date{\today} + +\begin{document} +\maketitle + +\section{It compiles} +This PDF was produced entirely client-side by the \texttt{busytex} WebAssembly +build of pdf\LaTeX{}, with no compile server. A representative equation: +\begin{equation} + \int_{0}^{\infty} e^{-x^2}\,\mathrm{d}x = \frac{\sqrt{\pi}}{2}. +\end{equation} + +\section{Why this spike exists} +To confirm in-browser LaTeX compilation is viable for the editor preview, and to +measure cold-start and compile time before building the editor UI. See +\href{https://example.invalid}{the plan} for context. + +\end{document} diff --git a/spikes/latex-wasm-busytex/serve.mjs b/spikes/latex-wasm-busytex/serve.mjs new file mode 100644 index 00000000..53f17b75 --- /dev/null +++ b/spikes/latex-wasm-busytex/serve.mjs @@ -0,0 +1,47 @@ +// Minimal static server for the spike. +// Sets COOP/COEP (cross-origin isolation) in case the emscripten build wants +// SharedArrayBuffer, and serves .wasm/.data with sane types. Node >= 18, no deps. +import { createServer } from 'node:http'; +import { readFile, stat } from 'node:fs/promises'; +import { extname, join, normalize } from 'node:path'; +import { fileURLToPath } from 'node:url'; + +const ROOT = fileURLToPath(new URL('.', import.meta.url)); +const PORT = process.env.PORT ? Number(process.env.PORT) : 8099; + +const TYPES = { + '.html': 'text/html; charset=utf-8', + '.js': 'text/javascript; charset=utf-8', + '.mjs': 'text/javascript; charset=utf-8', + '.json': 'application/json', + '.wasm': 'application/wasm', + '.data': 'application/octet-stream', + '.tex': 'text/plain; charset=utf-8', + '.pdf': 'application/pdf', +}; + +createServer(async (req, res) => { + // Cross-origin isolation — harmless when unused, required if the engine uses threads. + res.setHeader('Cross-Origin-Opener-Policy', 'same-origin'); + res.setHeader('Cross-Origin-Embedder-Policy', 'require-corp'); + res.setHeader('Cross-Origin-Resource-Policy', 'same-origin'); + + try { + const urlPath = decodeURIComponent((req.url || '/').split('?')[0]); + const rel = normalize(urlPath === '/' ? '/index.html' : urlPath).replace(/^(\.\.[/\\])+/, ''); + const filePath = join(ROOT, rel); + if (!filePath.startsWith(ROOT)) { res.writeHead(403).end('forbidden'); return; } + + const info = await stat(filePath); + if (info.isDirectory()) { res.writeHead(403).end('forbidden'); return; } + + const body = await readFile(filePath); + res.setHeader('Content-Type', TYPES[extname(filePath)] || 'application/octet-stream'); + res.setHeader('Content-Length', info.size); + res.writeHead(200).end(body); + } catch (err) { + res.writeHead(err.code === 'ENOENT' ? 404 : 500).end(String(err)); + } +}).listen(PORT, () => { + console.log(`Spike server on http://localhost:${PORT} (Ctrl-C to stop)`); +});