Skip to content

gauthierpiarrette/dbt-features

Repository files navigation

dbt-features logo

dbt-features

Feature catalog for dbt projects, built for ML teams.

tests demo python coverage license

Live demo · Docs

Catalog index showing feature groups grouped by entity, with faceted filters for type, lifecycle, freshness, and owner


Install

pip install dbt-features

For warehouse enrichment (freshness, row counts, null %), install the extra for your warehouse:

pip install 'dbt-features[duckdb]'       # local / dbt-duckdb
pip install 'dbt-features[postgres]'     # Postgres
pip install 'dbt-features[redshift]'     # Redshift
pip install 'dbt-features[snowflake]'    # Snowflake
pip install 'dbt-features[bigquery]'     # BigQuery

Requires Python 3.10+. The base install does not depend on dbt-core.

Quickstart

Try it with no setup - bundled sample data, served on a free port:

dbt-features demo

On your dbt project:

dbt parse
dbt-features build --connection my_profile --output ./catalog
dbt-features serve --output ./catalog

In CI, gate on the metadata with the linter:

dbt-features lint --strict                       # fail on errors and warnings
dbt-features lint --entity-catalog entities.yml  # also enforce entity names

For production (with type inference and correct schemas):

dbt docs generate                              # populates catalog.json with column types
dbt-features build \
    --manifest target/manifest.json \
    --catalog target/catalog.json \
    --connection my_profile --target prod \
    --output ./catalog

See Production setup for details.

What it does

  • Reads your dbt manifest and finds models marked is_feature_table: true.
  • Renders a static HTML site: feature groups, features, lineage, ML model consumers.
  • With --connection, pulls freshness, row counts, null %, and cardinality from the warehouse.

Read-only. No backend, no database.

Python API

Downstream tools can build on the parsed catalog instead of re-reading manifest.json:

from dbt_features import parse_project

catalog = parse_project("path/to/dbt/project")
for group in catalog.feature_groups:
    for feature in group.features:
        print(feature.name, feature.consumers_derived, feature.consumers_declared)

Stability: parse_project, Catalog, FeatureGroup, Feature, ExposureInfo, and LineageRef (all re-exported from dbt_features) are the public, provisional surface. They are intended to be stable but are not yet semver-locked — the resolved data model is still settling, so expect only additive changes, called out in the changelog, until the surface is frozen. Anything prefixed _, the renderer/enrichment internals, and the HTML layout are internal and may change at any time. Feature.used_by is deprecated in favour of consumers_derived / consumers_declared and is slated for removal.

Docs

Development

git clone https://github.com/gauthierpiarrette/dbt-features
cd dbt-features
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest

License

Apache 2.0. Not affiliated with dbt Labs.

About

Feature catalog for dbt projects, built for ML teams.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors