feat(database): add databricks oauth support#41421
Conversation
get_oauth2_token clobbered the config's already-resolved token_request_uri
with the AWS template that still contained an unsubstituted account-id
placeholder, so the token exchange POSTed to .../accounts/{}/v1/token. Only
fall back to the AWS endpoint when no token_request_uri is configured.
Co-authored-by: fabian_zse <fabian@zalando.de>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #41421 +/- ##
==========================================
- Coverage 64.44% 64.38% -0.06%
==========================================
Files 2655 2656 +1
Lines 145491 145741 +250
Branches 33585 33626 +41
==========================================
+ Hits 93762 93842 +80
- Misses 50028 50195 +167
- Partials 1701 1704 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review Agent Run #0cb4e7
Actionable Suggestions - 1
-
docs/user_docs_versioned_docs/version-6.0.0/configuration/databases.mdx - 1
- Missing account_id placeholder substitution · Line 545-548
Additional Suggestions - 5
-
docs/user_docs_versioned_docs/version-6.0.0/configuration/databases.mdx - 1
-
Azure token_request_uri placeholder not substituted · Line 547-548The Azure entry in `_oauth2_endpoints` (`databricks.py:293-294`) has a `{}` placeholder in `token_request_uri` (`https://login.microsoftonline.com/{}/oauth2/v2.0/token`). Unlike `authorization_request_uri` where the code in `get_oauth2_authorization_uri` can override it, the `token_request_uri` is set in `get_oauth2_config` (base.py:745-747) from the class attribute which is empty (""), then never updated. Additionally, `get_oauth2_token` (databricks.py:605-614) only provides a fallback for missing token_request_uri using the AWS endpoint — it never extracts a per-provider tenant_id. The documentation at line 547 only flags `authorization_request_uri` as needing override, omitting `token_request_uri`, so users will configure and still fail silently.
-
-
tests/unit_tests/db_engine_specs/test_databricks.py - 2
-
Duplicated fixture definitions · Line 418-432The `oauth2_config_python` fixture (lines 418-432) is functionally identical to `oauth2_config_native` (lines 402-416) — both return the same dictionary. This duplication adds maintenance burden and divergence risk if one needs to change.
-
Semantic test duplication · Line 667-701This test is semantically duplicated from `test_get_oauth2_fresh_token_native` (lines 630-664). Both tests use identical fixtures (the `request_content_type: "json"` configs are structurally identical), call the same inherited method, and assert identical expectations. Since neither `DatabricksNativeEngineSpec` nor `DatabricksPythonConnectorEngineSpec` overrides `get_oauth2_fresh_token()`, testing them separately provides no additional coverage. The duplication creates maintenance risk — any future change must be applied twice, risking divergence.
-
-
superset/db_engine_specs/databricks.py - 2
-
Duplicate import across diff · Line 575-575The `from typing import cast` import appears 4 times (lines 575, 603, 856, 884) — twice in each child class. Consolidating the methods into the parent class will eliminate 3 of these. The remaining one should be moved to module-level imports.
-
Docstring Missing Accepted Values · Line 310-326The `_detect_cloud_provider` method accepts 'cloud_provider' from extra configuration but only validates against known providers after lowercasing. Consider documenting accepted values and what happens with invalid values.
-
Filtered by Review Rules
Bito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
-
superset/db_engine_specs/databricks.py - 2
- Semantic duplication across classes · Line 557-587
- Semantic duplication across classes · Line 589-616
-
tests/unit_tests/db_engine_specs/test_databricks.py - 1
- Missing pytest.fixture decorator · Line 402-432
Review Details
-
Files reviewed - 4 · Commit Range:
dafe036..47b7e01- docs/user_docs_versioned_docs/version-6.0.0/configuration/databases.mdx
- superset/db_engine_specs/databricks.py
- tests/unit_tests/db_engine_specs/test_databricks.py
- tests/unit_tests/db_engine_specs/test_databricks_multi_cloud.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
…uth fixtures Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Code Review Agent Run #71d2acActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
…figured URIs
The per-cloud OAuth2 endpoint templates carry a `{}` placeholder for the
Databricks account id (or Azure tenant id) that was never substituted, so
auto-detected authorize/token URLs were emitted as `.../accounts/{}/v1/...`.
The authorization-URI methods also unconditionally overwrote a fully-resolved
`authorization_request_uri` supplied via DATABASE_OAUTH2_CLIENTS.
- Add `_resolve_oauth2_endpoint`: substitutes `account_id`/`tenant_id` from the
database extra into the template, or raises OAuth2Error when absent instead of
issuing a request to an unresolved endpoint.
- Preserve a configured `authorization_request_uri`; only auto-detect/resolve
when none is set.
- `get_oauth2_token` has no database context to auto-detect, so fail fast on a
missing `token_request_uri` rather than POST to `.../{}/v1/token`.
- Cover auto-detect/resolve, preserve-configured, and fail-fast paths for both
the native and Python-connector specs; document `account_id`/`tenant_id`.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The multi-cloud OAuth2 URI tests passed a config with a fully-resolved authorization_request_uri, which the engine spec now preserves. Drop the URI for the Azure/GCP detection cases (and give those mock databases an account_id/tenant_id) so the per-provider endpoint is actually resolved. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The authorization endpoint auto-resolves from the hostname, but the token exchange has no database context, so token_request_uri must be supplied for the auto-detected flow. Docs implied both endpoints auto-detect. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Code Review Agent Run #909191Actionable Suggestions - 0Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
SUMMARY
OAuth 2.0 support for Databricks, across AWS, Azure, and GCP.
Adopts #34619 (original work by @drummerwolli, who stepped away). I re-applied their commits on current master and fixed the failing unit tests. Authorship is preserved via co-author trailers.
The only real fix on top of the original work:
get_oauth2_tokenwas overwriting the config's already-resolvedtoken_request_uriwith the AWS template that still had an unsubstituted account-id placeholder, so the token exchange POSTed to.../accounts/{}/v1/token. It now only falls back to the AWS endpoint when no URI is configured. No behavioral change to the feature otherwise.TESTING INSTRUCTIONS
Unit tests:
pytest tests/unit_tests/db_engine_specs/test_databricks.py tests/unit_tests/db_engine_specs/test_databricks_multi_cloud.pyADDITIONAL INFORMATION