Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ __pycache__/
dist/
build/
.venv/
.env
.eggs/
*.egg
.pytest_cache/
Expand Down
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,18 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

## [0.3.8] - 2026-06-03

### Added

- **Gitea**: wildcard owner/org source via `gitea:owner/*` to sync every repository under an owner into one Knowledge Base. Files are prefixed by repository name to avoid path collisions.

## [0.3.7] - 2026-06-02

### Added

- **Gitea**: repository source connector via `gitea:owner/repo`, configurable with `GITEA_URL` and optional `GITEA_TOKEN`. Supports branch and subdirectory scoping, uses Gitea Git tree blob SHAs for incremental sync checksums, and syncs files through the raw content API.

## [0.3.6] - 2026-05-28

### Added
Expand Down
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# 📚 oikb

Keep your [Open WebUI](https://github.com/open-webui/open-webui) Knowledge Bases in sync. Point it at a local directory, a GitHub repo, a Confluence space, an S3 bucket, or any of 44 supported sources. Only new and modified files are uploaded via incremental SHA-256 diffing.
Keep your [Open WebUI](https://github.com/open-webui/open-webui) Knowledge Bases in sync. Point it at a local directory, a GitHub repo, a Confluence space, an S3 bucket, or any of 45 supported sources. Only new and modified files are uploaded via incremental SHA-256 diffing.

> [!IMPORTANT]
> Requires **Open WebUI 0.9.6+**
Expand Down Expand Up @@ -134,11 +134,11 @@ services:
timeout: 5s
```

## 44 Connectors
## 45 Connectors

| Category | Sources |
|---|---|
| **Code Repos** | GitHub, GitLab, Bitbucket |
| **Code Repos** | GitHub, Gitea, GitLab, Bitbucket |
| **Cloud Storage** | S3, GCS, Azure Blob, Dropbox, R2, Google Drive, SharePoint, Egnyte, Oracle Cloud |
| **Wikis & KBs** | Confluence, Notion, BookStack, Discourse, GitBook, Guru, Outline, Slab, Document360, DokuWiki, Google Sites |
| **Ticketing** | Jira, Linear, Zendesk, Freshdesk, Asana, ClickUp, Airtable, ServiceNow, ProductBoard |
Expand All @@ -150,6 +150,8 @@ services:

```bash
oikb sync github:owner/repo --kb-id your-kb-id
GITEA_URL=https://gitea.example.com oikb sync gitea:owner/repo --kb-id your-kb-id
GITEA_URL=https://gitea.example.com oikb sync 'gitea:owner/*' --kb-id your-kb-id
oikb sync confluence:ENG --kb-id your-kb-id
oikb sync s3://bucket/prefix --kb-id your-kb-id
oikb sync servicenow:incident --kb-id your-kb-id
Expand Down
40 changes: 38 additions & 2 deletions docs/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ A complete guide to syncing content into Open WebUI Knowledge Bases.
- [Sources](#sources)
- [Local Directories](#local-directories)
- [GitHub](#github)
- [Gitea](#gitea)
- [GitLab / Bitbucket](#gitlab--bitbucket)
- [Confluence](#confluence)
- [Cloud Storage (S3 / GCS / Azure)](#cloud-storage-s3--gcs--azure)
Expand Down Expand Up @@ -232,6 +233,41 @@ sources:
path: docs/
```

### Gitea

```bash
export GITEA_URL=https://gitea.example.com
export GITEA_TOKEN=your-token # required for private repos
oikb sync gitea:owner/repo --kb-id your-kb-id
```

Gitea requires `GITEA_URL` because instances are self-hosted. Set `GITEA_TOKEN` for private repositories or higher API limits. Like GitHub, Gitea syncs the default branch by default and supports branch and subdirectory selection:

```bash
oikb sync gitea:owner/repo --branch main --path docs/
```

To sync every repository owned by a Gitea user or organization, use `*` as the repository name. Files are stored under a repository-name prefix to prevent collisions:

```bash
oikb sync gitea:owner/* --kb-id your-kb-id
```

Or in `.oikb.yaml`:

```yaml
sources:
- name: gitea-docs
source: gitea:myorg/docs
kb-id: abc123
branch: main
path: docs/

- name: all-gitea-repos
source: gitea:myorg/*
kb-id: abc123
```

### GitLab / Bitbucket

```bash
Expand Down Expand Up @@ -316,11 +352,11 @@ sources:

### All Connectors

44 connectors available. See the full list:
45 connectors available. See the full list:

| Category | Sources |
|---|---|
| **Git** | GitHub, GitLab, Bitbucket |
| **Git** | GitHub, Gitea, GitLab, Bitbucket |
| **Cloud Storage** | S3, GCS, Azure Blob, Dropbox, R2, Google Drive, SharePoint, Egnyte, Oracle Cloud |
| **Wikis & KBs** | Confluence, Notion, BookStack, Discourse, GitBook, Guru, Outline, Slab, Document360, DokuWiki, Google Sites |
| **Ticketing** | Jira, Linear, Zendesk, Freshdesk, Asana, ClickUp, Airtable, ServiceNow, ProductBoard |
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "oikb"
version = "0.3.6"
version = "0.3.8"
description = "Sync anything to Open WebUI Knowledge Bases"
readme = "README.md"
authors = [
Expand Down
2 changes: 1 addition & 1 deletion src/oikb/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""oikb — Open WebUI Knowledge Base CLI."""

__version__ = "0.3.5"
__version__ = "0.3.8"
11 changes: 8 additions & 3 deletions src/oikb/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,11 @@ def _resolve_connector(source: str, branch: str | None = None, path: str | None
parsed = parse_github_source(source)
return GitHubConnector(owner=parsed["owner"], repo=parsed["repo"], branch=branch, path=path or parsed.get("path"))

if source.startswith("gitea:"):
from oikb.connectors.gitea import GiteaConnector, parse_gitea_source
parsed = parse_gitea_source(source)
return GiteaConnector(owner=parsed["owner"], repo=parsed["repo"], branch=branch, path=path or parsed.get("path"))

if source.startswith("gitlab:"):
from oikb.connectors.gitlab import GitLabConnector, parse_gitlab_source
parsed = parse_gitlab_source(source)
Expand Down Expand Up @@ -349,7 +354,7 @@ def _build_cli_filter(max_file_size: str | None):
@cli.command()
@click.argument("source", required=False)
@common_options
@click.option("--branch", default=None, help="Branch for GitHub sources.")
@click.option("--branch", default=None, help="Branch for Git repository sources.")
@click.option("--path", "source_path", default=None, help="Subdirectory within the source.")
@click.option("--dry-run", is_flag=True, help="Preview changes without uploading.")
@click.option("-v", "--verbose", is_flag=True, help="Show detailed progress.")
Expand Down Expand Up @@ -506,7 +511,7 @@ def sync(
@cli.command()
@click.argument("source")
@common_options
@click.option("--branch", default=None, help="Branch for GitHub sources.")
@click.option("--branch", default=None, help="Branch for Git repository sources.")
@click.option("--path", "source_path", default=None, help="Subdirectory within the source.")
@click.option("-v", "--verbose", is_flag=True, help="Show detailed output.")
@click.pass_context
Expand Down Expand Up @@ -713,7 +718,7 @@ def status(url: str | None, token: str | None, kb: str | None):

try:
info = client.get_kb(kb)
files = info.get("files", [])
files = info.get("files") or []
except Exception as e:
click.echo(click.style(f"Failed: {e}", fg="red"), err=True)
sys.exit(1)
Expand Down
2 changes: 1 addition & 1 deletion src/oikb/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,4 +135,4 @@ def list_kb_files(self, kb_id: str) -> list[dict[str, Any]]:
resp = self._http.get(f"/knowledge/{kb_id}")
resp.raise_for_status()
data = resp.json()
return data.get("files", [])
return data.get("files") or []
217 changes: 217 additions & 0 deletions src/oikb/connectors/gitea.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
"""Gitea connector — sync Gitea repos to a Knowledge Base via the API.

Uses the Gitea Git Trees API for checksums (blob SHAs) — no local clone needed.
Set GITEA_URL to your instance URL, and GITEA_TOKEN for private repositories.
"""

from __future__ import annotations

import os

import httpx

from oikb.connectors import BaseConnector, ManifestEntry


class GiteaConnector(BaseConnector):
"""Sync files from one Gitea repository, or all repos for an owner.

Args:
owner: Repository owner or organization.
repo: Repository name, or "*" for all repos owned by owner.
branch: Branch to sync from (default: repo default branch).
path: Subdirectory to scope to (e.g. "docs/").
token: Gitea personal access token (or GITEA_TOKEN env var).
base_url: Gitea instance URL (or GITEA_URL env var).
"""

def __init__(
self,
owner: str,
repo: str,
branch: str | None = None,
path: str | None = None,
token: str | None = None,
base_url: str | None = None,
):
self.owner = owner
self.repo = repo
self.branch = branch
self.path = path.strip("/") if path else None
self._token = token or os.environ.get("GITEA_TOKEN")
self._base_url = (base_url or os.environ.get("GITEA_URL") or "").rstrip("/")
self._default_branches: dict[str, str] = {}

if not self._base_url:
raise ValueError("GITEA_URL is required for gitea: sources (e.g. https://gitea.example.com)")

headers: dict[str, str] = {"Accept": "application/json"}
if self._token:
headers["Authorization"] = f"token {self._token}"

self._http = httpx.Client(
base_url=f"{self._base_url}/api/v1",
headers=headers,
timeout=60.0,
)

def build_manifest(self) -> list[ManifestEntry]:
"""Fetch the repo tree and build a manifest.

Gitea paginates the recursive tree endpoint. Blob SHAs are used as
checksums because they are content-addressable hashes.
"""
if self._all_repos:
entries: list[ManifestEntry] = []
for repo in self._list_repos():
entries.extend(self._build_repo_manifest(repo, prefix_repo=True))
entries.sort(key=lambda e: e.display_path)
return entries

return self._build_repo_manifest(self.repo)

def _build_repo_manifest(self, repo: str, prefix_repo: bool = False) -> list[ManifestEntry]:
"""Fetch one repo tree and build manifest entries."""
ref = self.branch or self._get_default_branch(repo)
entries: list[ManifestEntry] = []
seen_items = 0
page = 1

while True:
resp = self._http.get(
f"/repos/{self.owner}/{repo}/git/trees/{ref}",
params={"recursive": "true", "per_page": 100, "page": page},
)
Comment on lines +81 to +84
resp.raise_for_status()
tree = resp.json()
items = tree.get("tree", [])

if not items:
break

seen_items += len(items)

for item in items:
if item.get("type") != "blob":
continue

file_path = item["path"]

# Filter by path prefix if specified.
if self.path:
if not file_path.startswith(self.path + "/"):
continue
# Strip the prefix so paths are relative to the scoped dir.
file_path = file_path[len(self.path) + 1 :]

parts = file_path.rsplit("/", 1)
if len(parts) == 2:
dir_path, filename = parts
else:
dir_path, filename = "", parts[0]

if prefix_repo:
dir_path = f"{repo}/{dir_path}" if dir_path else repo

entries.append(
ManifestEntry(
filename=filename,
path=dir_path,
checksum=item["sha"], # Git blob SHA — content-addressable.
size=item.get("size", 0),
)
)

total_count = tree.get("total_count")
if total_count is not None:
if seen_items >= total_count:
break
elif len(items) < 100:
break

page += 1

entries.sort(key=lambda e: e.display_path)
return entries

def read_file(self, path: str, filename: str) -> bytes:
"""Download a file's raw content via the Gitea raw file endpoint."""
file_path = f"{path}/{filename}" if path else filename
repo = self.repo

if self._all_repos:
repo, _, file_path = file_path.partition("/")
if not repo or not file_path:
raise ValueError(f"Invalid wildcard Gitea path: {path}/{filename}")

if self.path:
file_path = f"{self.path}/{file_path}"

ref = self.branch or self._get_default_branch(repo)

resp = self._http.get(
f"/repos/{self.owner}/{repo}/raw/{file_path}",
params={"ref": ref},
)
resp.raise_for_status()
return resp.content

@property
def _all_repos(self) -> bool:
return self.repo == "*"

def _get_default_branch(self, repo: str) -> str:
"""Fetch and cache the repo's default branch name."""
if repo not in self._default_branches:
resp = self._http.get(f"/repos/{self.owner}/{repo}")
resp.raise_for_status()
self._default_branches[repo] = resp.json()["default_branch"]
return self._default_branches[repo]

def _list_repos(self) -> list[str]:
"""List repositories for the configured owner or organization."""
repos: list[str] = []
page = 1

while True:
resp = self._http.get(f"/orgs/{self.owner}/repos", params={"page": page, "limit": 50})
if resp.status_code == 404:
resp = self._http.get(f"/users/{self.owner}/repos", params={"page": page, "limit": 50})
resp.raise_for_status()
items = resp.json()

if not items:
break

repos.extend(repo["name"] for repo in items)

if len(items) < 50:
break
page += 1

repos.sort()
return repos

def close(self) -> None:
self._http.close()


def parse_gitea_source(source: str) -> dict[str, str | None]:
"""Parse a gitea:owner/repo[/path] source string.

Examples:
gitea:myorg/docs
gitea:myorg/docs/api
gitea:myorg/*
"""
source = source.removeprefix("gitea:")

parts = source.split("/", 2)
if len(parts) < 2:
raise ValueError(f"Invalid Gitea source: {source}. Expected: gitea:owner/repo")

owner = parts[0]
repo = parts[1]
path = parts[2] if len(parts) > 2 else None

return {"owner": owner, "repo": repo, "path": path}
Loading