feat(zotero): add Zotero connector#61
Open
thiswillbeyourgithub wants to merge 9 commits into
Open
Conversation
Extracts text from PDF attachments in a Zotero collection (and its subcollections) and exposes them as .txt files for sync. Collection hierarchy maps to KB directories. Read-only with respect to Zotero. Checksum mode is configurable via ZOTERO_CHECKSUM: 'version' (cheap, default) or 'content' (hashes extracted text). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Personal, untracked scripts and configs (e.g. the Zotero sync cheatsheet) live in perso/ and should not be tracked. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The diff's mkdir list had no guaranteed order, so a nested directory could be processed before its parent. Since the parent's id is looked up from directory_map (populated as dirs are created), an out-of-order child would get parent_id=None and be created at the wrong level. Sorting lexicographically puts "a" before "a/b", guaranteeing parent-first creation, and also makes the dry-run "Dirs to create" output stable and readable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When a Zotero attachment's bytes aren't in zotero.org storage (file sync off, storage quota exceeded, WebDAV-only, or a web link), the /file endpoint returns 404. In 'version' checksum mode this already surfaced per-file at upload time and the rest of the sync completed. But in 'content' mode the download happens during build_manifest, so a single 404 propagated out of run_sync and aborted the entire run with nothing uploaded. Make _checksum degrade to the version checksum when text can't be retrieved, so the manifest is always built and the failure is reported through the normal per-file upload error path (sync everything we can, list what failed at the end). Also wrap the file-download failure in a clear message explaining the bytes aren't in Zotero storage instead of dumping a raw HTTP 404. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A file the source advertises in its manifest but cannot actually provide content for (e.g. a Zotero attachment whose bytes aren't in storage: a web link, linked file, or WebDAV-only item) is a source-side data gap, not an oikb failure. Previously any such case landed in result.errors and made the whole sync exit 1, so a library with a few file-less attachments could never report success (a recurring systemd/daemon run would always look "failed"). Introduce a dedicated SourceFileUnavailable exception in the connectors base. The Zotero connector raises it (instead of a bare RuntimeError) for the no-downloadable-file case. run_sync catches it specifically (no retry) and routes the file to a new result.warnings list instead of result.errors; the CLI and daemon surface warnings but only result.errors affects the exit code and success/partial status. Any other read_file() exception still fails the run exactly as before, so this narrows to that one error class only. Tests cover all three paths: the Zotero missing-file mapping, warning routing in run_sync, and that a generic read failure is still an error. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
thiswillbeyourgithub
added a commit
to thiswillbeyourgithub/openwebui-knowledge-zotero-sync
that referenced
this pull request
Jun 22, 2026
The Zotero connector port has been opened as a draft PR against the official open-webui/oikb project (open-webui/oikb#61). Update the deprecation notice to point at it; this repo will be sunset once it lands. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a Zotero connector to oikb. It syncs the text of the PDF attachments of items in a Zotero collection (and its subcollections) into an Open WebUI Knowledge Base, mapping the Zotero collection hierarchy onto KB directories. It is read-only with respect to Zotero: it never modifies, deletes, or adds anything to the library.
This is a Claude Code port of an earlier standalone tool I wrote, https://github.com/thiswillbeyourgithub/openwebui-knowledge-zotero-sync, reworked to fit oikb's connector interface.
Zotero connector
zotero:<hierarchy>source scheme.%%separates collection names(e.g.
zotero:Research%%Machine Learning); a barezotero:syncs every top-level collection.ZOTERO_CHECKSUM:version(cheap, hashes the Zotero item version, no download, default) orcontent(hashes the extracted text).ZOTERO_EXCLUDEskips subcollections, relative to the synced root.ZOTERO_LIBRARY_ID,ZOTERO_API_KEY,ZOTERO_LIBRARY_TYPE,ZOTERO_CHECKSUM,ZOTERO_EXCLUDE.pip install oikb[zotero](pyzotero + pymupdf).Sync change: unavailable source files are warnings, not fatal errors
A file the source advertises in its manifest but cannot provide content for (e.g. a Zotero attachment whose bytes aren't in storage: a web link, a linked file, or a WebDAV-only item) is a source-side data gap, not an oikb failure. Previously any such case landed in
result.errorsand exited the whole sync with code 1.This PR adds a
SourceFileUnavailableexception to the connector base.run_synccatches it specifically (no retry) and routes the file to a newresult.warningslist instead of
result.errors, so a sync whose only problems are unavailable source files still succeeds (exit 0). Any otherread_file()exception still fails the run exactly as before. The CLI and daemon surface warnings separately from errors; onlyresult.errorsaffects the exit code / success status.Tests
tests/test_sync_warnings.pycovers all three paths: the Zotero missing-file mapping toSourceFileUnavailable, warning routing inrun_sync, and that a generic read failure is still an error.Notes