Skip to content

Enhance Oracle Database MCP Toolkit: Expand RAG Toolset, Improve Configuration, and Fix Issues#279

Open
MouhsinElmajdouby wants to merge 35 commits into
oracle:mainfrom
Youssef-Erradi:main
Open

Enhance Oracle Database MCP Toolkit: Expand RAG Toolset, Improve Configuration, and Fix Issues#279
MouhsinElmajdouby wants to merge 35 commits into
oracle:mainfrom
Youssef-Erradi:main

Conversation

@MouhsinElmajdouby

@MouhsinElmajdouby MouhsinElmajdouby commented May 21, 2026

Copy link
Copy Markdown
Member

Description

This PR aims to improve the Oracle Database MCP toolkit by introducing the following changes:

  • Add a new RAG toolset that includes:
    • vector-store: create and list vector stores for document embeddings.
    • vector-model: list and drop ONNX embedding models registered in the database.
    • embed: embed documents from local files, Oracle tables, OCI objects, or OCI buckets, with PAR URL support.
    • task: track and list async embedding jobs.
    • oci-storage: list OCI bucket objects.
  • Fix read-query serialization failures for Oracle TIMESTAMP values.
  • Add enabled support for custom tools and custom toolsets, with clear precedence rules between tool-level and toolset-level settings. Custom tools and toolsets are enabled by default, and edit-tool now supports enabling or disabling tools.
  • Fix config poller behavior so it only runs when a config file is provided, avoiding errors when the MCP toolkit starts without a config file.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • Built the new MCP server JAR locally using: mvn clean package.
  • Tested the new RAG toolset locally.
  • Verified read-querytool returns Oracle TIMESTAMP values without JSON serialization errors.
  • Verified the MCP toolkit starts successfully without a config file and does not trigger config poller errors.
  • Verified the custom tool and toolset enabled behavior, including default enablement.
  • Tested the MCP server in both supported transport modes: stdio and https.

Test Configuration:

  • Firmware version: N/A.
  • Hardware: Local development machine
  • Toolchain: JDK 17+, Maven 3.9+.
  • SDK: N/A.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

MouhsinElmajdouby and others added 19 commits February 11, 2026 16:39
Support enabled on custom tools/toolsets and clarify enablement precedence
…alization

Fix read-query serialization failure for Oracle TIMESTAMP values
…ogging

Fixing error logging when missing config file
@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 21, 2026
@MouhsinElmajdouby MouhsinElmajdouby changed the title Expand RAG Toolset and Improve MCP Toolkit Configuration Enhance Oracle Database MCP Toolkit: Expand RAG Toolset, Improve Configuration, and Fix Issues May 21, 2026
@krisrice

Copy link
Copy Markdown
Member

Automated review result: needs security/manual review.

Reason:

  • This PR expands the Oracle Database MCP Toolkit with database RAG/vector tools, vector-store creation, background embedding jobs, DBMS_CLOUD/Object Storage access, credential listing, SQL execution paths, and admin/runtime tool behavior.

Manual review should verify:

  • Server-side authorization and least-privilege scope.
  • Parameter validation and bounded output/pagination.
  • Secret/customer-data redaction in logs, errors, and returned content.
  • Retrieved database/Object Storage content is treated as untrusted data, not instructions.
  • Tests for auth failure, validation failure, side effects, bounded output, and redaction where applicable.

No automated approval or merge was performed.

@krisrice

Copy link
Copy Markdown
Member

Security review remediation notes for the new RAG/admin surfaces. Please address before merge.

Blocking findings:

  1. edit-tools is a persistent tool-injection path. It accepts caller-provided tool name, data source, SQL statement, enabled state, and parameters, writes them to the YAML config, and hot-reloads them without role gating, confirmation, statement policy, allowlist, or audit. Require explicit admin-only exposure, validate the statement/data source, audit changes, and consider disabling this tool by default.

  2. SQL/JSON injection risks exist in vector embedding paths. modelName is formatted directly into VECTOR_EMBEDDING(%s USING ? AS data), allowedExtensions are concatenated into SQL in buildExtensionCondition, and embedding params JSON is built via string formatting. Replace these with strict allowlists/identifier validation and proper JSON serialization.

  3. embed file/files can read arbitrary server-local files. Caller-supplied filePath/filePaths are passed to Files.readAllBytes without canonicalization, root allowlist, extension policy, max file size, or secret-file blocking. Add a configured ingest root and reject paths outside it.

  4. OCI object/bucket ingestion accepts arbitrary raw URLs and exposes credential inventory. Validate scheme/host/region/namespace/bucket against allowlists, URL-encode path components, cap list results, and do not expose DBMS_CLOUD credential names/usernames except to an explicitly admin-gated tool.

  5. Background embedding work is unbounded. newCachedThreadPool, unbounded task storage, whole-bucket ingestion, and no per-user/process limits create DoS/cost risk. Add bounded executor, queue/task caps, cancellation/expiry, file/object count limits, byte limits, and progress/error redaction.

OWASP GenAI/LLM mapping: LLM02 sensitive information disclosure, LLM05 improper output/tool handling, LLM06 excessive agency, LLM08 vector/embedding weaknesses, and LLM10 unbounded consumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants