Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: ci

on:
push:
branches: [main]
pull_request:

concurrency:
group: ci-${{ github.ref }}
cancel-in-progress: true

permissions:
contents: read

jobs:
quality:
name: lint · types · coverage (3.12)
runs-on: ubuntu-latest
steps:
# NOTE: tags are pinned to digests by Renovate (see renovate.json).
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install
run: pip install -e ".[dev,cli,mcp,rest-api]"
- name: Ruff (lint)
run: ruff check .
- name: Ruff (format)
run: ruff format --check .
- name: Mypy
run: mypy src/
- name: Pytest + coverage gate
run: pytest --cov --cov-report=term-missing -q

tests:
name: tests (${{ matrix.python }})
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python: ["3.10", "3.11", "3.13"] # 3.12 is covered by the quality job
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python }}
- name: Install
run: pip install -e ".[dev,cli,mcp,rest-api]"
- name: Pytest
run: pytest -q
19 changes: 12 additions & 7 deletions docs/guides/agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,19 +95,24 @@ over the event log, not a static table.

## Discovering at scale

The `discover` tool imports inline model lists or a JSON file:
The `discover` tool imports inline model lists, a JSON file, or a config-drivable connector:

```json
// discover(source_type="inline", models=[{"name": "...", "platform": "..."}])
{ "added": 12, "skipped": 0, "links_created": 8 }

// discover(source_type="connector", connector_name="rest",
// connector_config={"name": "mlflow", "url": "...", "items_path": "...", "name_field": "..."})
{ "models_added": 40, "links_created": 12, "errors": [] }
```

!!! info "Connector-based discovery runs from the SDK"
Pulling models *live* from SQL registries, REST APIs, or GitHub repos is done
through the SDK connectors (see [Connectors & discovery](connectors.md)) and then
persisted to a shared backend the agent reads. Wiring connector execution directly
into the `discover` tool is on the roadmap — until then, agents read the inventory
that scheduled connector runs populate.
!!! info "Which connectors an agent can run"
`rest` and `prefect` are pure-config connectors, so an agent can run them directly
through `discover`. `sql` and `github` need a live database connection or a parser
callable that can't be expressed as JSON — for those, `discover` returns a message in
the result's `errors` field pointing you to the SDK (see
[Connectors & discovery](connectors.md)). Connector problems come back as `errors`
rather than raising, so the agent always gets a usable response.

## Your docs are an agent surface, too

Expand Down
4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,9 @@ source = ["src/model_ledger"]
branch = true

[tool.coverage.report]
fail_under = 90
# Honest floor: real coverage is ~74% (2026-06). The previous 90 was never
# enforced (no CI ran --cov). Ratchet this upward as tests are added.
fail_under = 70
exclude_lines = [
"pragma: no cover",
"if TYPE_CHECKING:",
Expand Down
7 changes: 6 additions & 1 deletion src/model_ledger/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@

from __future__ import annotations

from importlib.metadata import PackageNotFoundError
from importlib.metadata import version as _pkg_version
from typing import TYPE_CHECKING, Any

from model_ledger.connectors import github_connector, rest_connector, sql_connector
Expand Down Expand Up @@ -120,7 +122,10 @@
"TraceOutput",
]

__version__ = "0.6.0"
try:
__version__ = _pkg_version("model-ledger")
except PackageNotFoundError: # running from a source checkout without an install
__version__ = "0.0.0+unknown"


def introspect(obj: Any, *, introspector: str | None = None) -> IntrospectionResult:
Expand Down
3 changes: 2 additions & 1 deletion src/model_ledger/rest/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

from fastapi import FastAPI, HTTPException

from model_ledger import __version__
from model_ledger.backends import batch_fallbacks
from model_ledger.backends.ledger_protocol import LedgerBackend
from model_ledger.core.exceptions import ModelNotFoundError
Expand Down Expand Up @@ -79,7 +80,7 @@ def create_app(
app = FastAPI(
title="Model Ledger API",
description="REST API for model inventory and governance",
version="0.5.0",
version=__version__,
)

@app.post("/record", response_model=RecordOutput)
Expand Down
137 changes: 95 additions & 42 deletions src/model_ledger/tools/discover.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,29 @@
from __future__ import annotations

import json
from collections.abc import Callable
from typing import Any

from model_ledger.connectors import prefect_connector, rest_connector
from model_ledger.graph.models import DataNode
from model_ledger.graph.protocol import SourceConnector
from model_ledger.sdk.ledger import Ledger
from model_ledger.tools.schemas import DiscoverInput, DiscoverOutput, ModelSummary

# Connectors whose entire configuration is plain data (no live connection
# object, no callable), so an agent can drive them purely from JSON.
_CONFIG_CONNECTORS: dict[str, Callable[[dict[str, Any]], SourceConnector]] = {
"rest": lambda config: rest_connector(**config),
"prefect": lambda config: prefect_connector(**config),
}

# Connectors that require a non-serializable argument an agent can't pass as
# JSON — point the caller to the SDK instead of failing opaquely.
_SDK_ONLY_CONNECTORS: dict[str, str] = {
"sql": "needs a live database connection",
"github": "needs a parser callable",
}


def _dict_to_datanode(d: dict[str, Any]) -> DataNode:
"""Convert a raw dict to a DataNode."""
Expand All @@ -21,67 +38,103 @@ def _dict_to_datanode(d: dict[str, Any]) -> DataNode:
)


def _error(message: str) -> DiscoverOutput:
return DiscoverOutput(models_added=0, models_skipped=0, links_created=0, errors=[message])


def _ingest(nodes: list[DataNode], ledger: Ledger, auto_connect: bool) -> DiscoverOutput:
"""Add nodes to the ledger, optionally connect, and summarize."""
add_result = ledger.add(nodes)
added = add_result["added"]
skipped = add_result["skipped"]

links_created = 0
if auto_connect and added > 0:
links_created = ledger.connect()["links_created"]

summaries: list[ModelSummary] = []
for node in nodes:
try:
ref = ledger.get(node.name)
except Exception:
continue
summaries.append(
ModelSummary(
name=ref.name,
owner=ref.owner,
model_type=ref.model_type,
platform=node.platform or None,
status=ref.status,
)
)

return DiscoverOutput(
models_added=added,
models_skipped=skipped,
links_created=links_created,
models=summaries,
)


def _discover_via_connector(input: DiscoverInput, ledger: Ledger) -> DiscoverOutput:
"""Build a config-driven connector, run it, and ingest the result.

Returns errors in ``DiscoverOutput.errors`` (never raises) so an agent gets
an actionable response instead of a crash.
"""
name = input.connector_name
if not name:
return _error("connector_name is required when source_type is 'connector'")

if name in _SDK_ONLY_CONNECTORS:
reason = _SDK_ONLY_CONNECTORS[name]
return _error(
f"The '{name}' connector {reason}, which can't be supplied as JSON. "
f"Run it from the Python SDK: ledger.add({name}_connector(...).discover())"
)

factory = _CONFIG_CONNECTORS.get(name)
if factory is None:
return _error(
f"Unknown connector '{name}'. Config-drivable connectors: "
f"{sorted(_CONFIG_CONNECTORS)}. SDK-only connectors: {sorted(_SDK_ONLY_CONNECTORS)}."
)

try:
connector = factory(input.connector_config or {})
nodes = list(connector.discover())
except Exception as exc:
return _error(f"connector '{name}' failed: {exc}")

return _ingest(nodes, ledger, input.auto_connect)


def discover(input: DiscoverInput, ledger: Ledger) -> DiscoverOutput:
"""Import models from external sources into the ledger.

Supports three source types:

- **inline**: models passed directly as a list of dicts.
- **file**: models loaded from a JSON file on disk.
- **connector**: not yet supported — raises ``NotImplementedError``.
- **connector**: run a config-drivable connector (``rest``, ``prefect``) from
``connector_config``. Connectors needing a live connection or a callable
(``sql``, ``github``) return a message in ``errors`` directing to the SDK.

When ``auto_connect`` is True and models were added, runs
``ledger.connect()`` to auto-link dependencies based on matching
input/output ports.
When ``auto_connect`` is True and models were added, runs ``ledger.connect()``
to auto-link dependencies based on matching input/output ports.
"""
if input.source_type == "connector":
raise NotImplementedError(
"Connector execution via tool not yet supported. Use the Python SDK directly."
)
return _discover_via_connector(input, ledger)

if input.source_type == "file":
if input.file_path is None:
raise ValueError("file_path is required when source_type is 'file'")
with open(input.file_path) as f:
raw_models = json.load(f)
nodes = [_dict_to_datanode(d) for d in raw_models]

else: # inline
if input.models is None:
raise ValueError("models is required when source_type is 'inline'")
nodes = [_dict_to_datanode(d) for d in input.models]

# Add nodes to ledger (content-hash dedup)
add_result = ledger.add(nodes)
added = add_result["added"]
skipped = add_result["skipped"]

# Auto-connect dependencies if requested and models were added
links_created = 0
if input.auto_connect and added > 0:
connect_result = ledger.connect()
links_created = connect_result["links_created"]

# Build summaries for added models
summaries: list[ModelSummary] = []
for node in nodes:
try:
ref = ledger.get(node.name)
summaries.append(
ModelSummary(
name=ref.name,
owner=ref.owner,
model_type=ref.model_type,
platform=node.platform or None,
status=ref.status,
)
)
except Exception:
pass

return DiscoverOutput(
models_added=added,
models_skipped=skipped,
links_created=links_created,
models=summaries,
)
return _ingest(nodes, ledger, input.auto_connect)
75 changes: 71 additions & 4 deletions tests/test_tools/test_discover.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,13 +158,80 @@ def test_file_none_path_raises(self, ledger):


class TestDiscoverConnector:
"""Connector source_type — should raise NotImplementedError."""
"""Connector source_type — config-drivable connectors run; others return
a graceful error in DiscoverOutput.errors rather than raising."""

def test_connector_raises(self, ledger):
def test_unknown_connector_returns_error(self, ledger):
inp = DiscoverInput(
source_type="connector",
connector_name="databricks",
connector_config={"workspace": "test"},
)
with pytest.raises(NotImplementedError, match="not yet supported"):
discover(inp, ledger)
result = discover(inp, ledger)
assert result.models_added == 0
assert result.errors and "databricks" in result.errors[0]

def test_missing_connector_name_returns_error(self, ledger):
inp = DiscoverInput(source_type="connector", connector_name=None)
result = discover(inp, ledger)
assert result.models_added == 0
assert result.errors and "connector_name" in result.errors[0]

def test_sql_connector_directs_to_sdk(self, ledger):
"""sql needs a live connection — can't come from JSON; point to the SDK."""
inp = DiscoverInput(
source_type="connector",
connector_name="sql",
connector_config={"query": "SELECT 1"},
)
result = discover(inp, ledger)
assert result.models_added == 0
assert result.errors
msg = result.errors[0].lower()
assert "sdk" in msg or "connection" in msg

def test_github_connector_directs_to_sdk(self, ledger):
inp = DiscoverInput(source_type="connector", connector_name="github")
result = discover(inp, ledger)
assert result.models_added == 0
assert result.errors

def test_rest_bad_config_returns_error(self, ledger):
"""Missing required rest config is caught, not raised."""
inp = DiscoverInput(source_type="connector", connector_name="rest", connector_config={})
result = discover(inp, ledger)
assert result.models_added == 0
assert result.errors and "rest" in result.errors[0]

def test_config_connector_runs_and_ingests(self, ledger, monkeypatch):
"""A config-drivable connector is built from config, run, and ingested."""
from importlib import import_module

from model_ledger.graph.models import DataNode

# tools/__init__ rebinds `discover` to the function, so fetch the module
# object itself to monkeypatch its connector registry.
discover_mod = import_module("model_ledger.tools.discover")

class _StubConnector:
name = "rest"

def discover(self):
return [
DataNode("feature_pipeline", platform="rest", outputs=["feature_table"]),
DataNode("scoring_model", platform="rest", inputs=["feature_table"]),
]

monkeypatch.setitem(
discover_mod._CONFIG_CONNECTORS, "rest", lambda config: _StubConnector()
)
inp = DiscoverInput(
source_type="connector",
connector_name="rest",
connector_config={"url": "https://x", "items_path": "i", "name_field": "n"},
auto_connect=True,
)
result = discover(inp, ledger)
assert result.models_added == 2
assert result.links_created >= 1
assert result.errors == []
Loading