AI Data Foundation

AI Data Foundation is an open-source reference implementation for building governed data access layers for MCP and AI agent applications.

It focuses on a common problem: AI agents should be able to retrieve business data, but must not bypass tenant isolation, object-level permissions, field masking, audit trails, or source-of-truth data pipelines.

Why it matters

Many AI / MCP / agent demos stop at “the model can query data”.

This project focuses on the harder part:

how external data is ingested without skipping Raw / event / outbox boundaries
how source-facing and canonical models stay separated
how OAuth, API keys, service accounts, permission enforcement, masking, and audit are applied before business data is returned
how retrieval can support agent use cases without turning into an unsafe direct-database shortcut

What it provides

Raw -> Source Change Event -> Outbox ingestion pipeline
Source and canonical customer / order models
OAuth 2.1 Authorization Code + PKCE demo authorization server
API Key and service-account access
Permission enforcement, masking, and audit logs
Candidate-only retrieval followed by authorized canonical backfill
Admin Console for governance workflows

Product preview

Landing-style overview of the governed access model:

Example governance UI for MCP tools, rollout rules, masking, and audit controls:

Project principles

This repository is guided by a few non-negotiable principles:

Raw first: source data should land in Raw before downstream normalization or retrieval
Event boundaries matter: ingestion and downstream publication should respect Change Envelope and outbox boundaries
Source and canonical models are different layers: source models preserve source semantics, while canonical models represent unified business meaning
Permissions are enforced server-side: MCP / API / agent callers must not define their own tenant, user, or permission scope
Retrieval must stay safe: candidate recall is not authorization, and all final business data must flow through authorized canonical backfill
Audit is part of the product: governed data access is incomplete if access, masking, and decision paths are not traceable

5-minute quick start

Requirements

Java 21
Maven 3.9+
Docker

Start infrastructure

docker compose up -d postgres redpanda

Start the application

mvn spring-boot:run

Open the main endpoints

Admin Console: http://localhost:8080/admin
OAuth discovery: http://localhost:8080/.well-known/openid-configuration
Customer API example: http://localhost:8080/api/customers?limit=10

Demo login

admin / admin123
sales001 / sales123

Current runnable data shape

The default runnable setup uses realistic mock data rather than a live JKYun production connection.

Current mock scale includes:

1,000 JKYun customers
3,600 JKYun orders
12,573 order lines
520 refund records
external-crm and retail-pos overlap samples

Current capabilities

Current Phase 0 / Phase 1 implementation includes:

mock customer / trade ingestion through a connector-driven pipeline
Raw object persistence plus raw_records, source_change_events, and source_event_outbox
customer and order normalizers
source_customers, source_trades, canonical_customers, canonical_orders, and identity_map
canonical_change_events
customer and order query APIs with Phase 1 permission enforcement
local Search / Vector-like retrieval for governed candidate recall
customer knowledge candidate-only serving API
OAuth 2.1 demo authorization server with PKCE, JWKS, introspection, and revoke
Admin Console with 6 governance workspaces, ECharts topology/trend views, replay / backfill, status mapping, MDM, reconcile, permission simulation, MCP sessions, and audit foundations

Project status

Current status: active early-stage reference implementation.

What is stable enough to explore today:

governed ingestion and normalization with realistic mock business data
customer / order serving paths with permission enforcement and audit
OAuth PKCE demo authorization flow for MCP / agent-facing access
Admin Console workflows for governance, replay, MDM, and operational inspection

What is still intentionally incomplete:

live JKYun production data integration
full production deployment posture
full standalone Permission Service rollout
production search / vector / analytics infrastructure

This means the repository is suitable for:

architecture review
OSS evaluation
local demos
governed MCP / agent integration experiments

It should not yet be presented as:

a finished production data platform
a live production JKYun connector
a complete enterprise identity and authorization product

Who this is for

This repository is a good fit for:

engineers building MCP / AI agent integrations that need governed business-data access
teams evaluating how to combine ingestion, canonical modeling, OAuth, permission checks, masking, and audit in one reference stack
architects who want a concrete example of Raw -> event -> canonical -> authorized retrieval boundaries
product or platform teams exploring safe retrieval patterns before connecting real production systems

Who this is not for

This repository is not a good fit if you need:

a drop-in production JKYun connector with real tenant credentials already integrated
a finished enterprise permission platform with full production policy lifecycle and organization sync
a production-ready OpenSearch / Qdrant / ClickHouse stack out of the box
a minimal toy MCP demo that ignores audit, masking, tenant isolation, and data-governance boundaries

Repository layout

High-level repository structure:

src/main/java/ — application code for ingestion, normalization, serving, auth, permission, audit, search, MDM, replay, and admin flows
src/main/resources/ — application config, Flyway migrations, and Admin Console static assets
src/test/java/ — unit and integration tests
docs/ — architecture, execution contract, implementation status, MCP auth design, and product-completion planning
mock-data/ — generated mock customer / order / refund / overlap data used for runnable demos
scripts/ — helper scripts such as OAuth PKCE local flow checks
docker-compose.yml — local infrastructure bootstrap
docs/05-local-runbook.md — detailed local runbook and command reference

Key docs

OSS reviewer checklist

If you are reviewing this repository as an OSS evaluator, the fastest path is:

read the top sections of this README for scope, status, and non-goals
check docs/02-implementation-status.md to see what is implemented versus still intentionally incomplete
inspect CHANGELOG.md for the current tagged capability snapshot
run docker compose up -d postgres redpanda and mvn spring-boot:run
verify the main endpoints:
- http://localhost:8080/admin
- http://localhost:8080/.well-known/openid-configuration
- http://localhost:8080/api/customers?limit=10
run mvn test

Things to keep in mind while reviewing:

this repository is intentionally honest about current boundaries
mock data is part of the runnable demo flow
real JKYun production integration is not yet complete
the project is designed as a governed reference implementation, not as a minimal MCP toy example

Roadmap

Near-term priorities:

connect real JKYun business data through the existing governed ingestion boundaries
harden OAuth / SSO integration beyond the current demo authorization-server slice
continue pushing MCP-safe permission, masking, audit, and retrieval patterns
validate deployment, operations, and observability for more production-like environments

Medium-term priorities:

evolve local retrieval into production-grade search / vector infrastructure
expand business-domain coverage beyond the current customer / order-centered slice
separate and harden Permission Service responsibilities where needed
improve production readiness for backup, alerting, and multi-environment rollout

Explicit non-goals for the current version

This repository does not currently claim:

live JKYun production API integration
production-grade multi-node deployment
full enterprise permission-service rollout
real OpenSearch / Qdrant / external LLM RAG production integration
final business-domain coverage for inventory, suppliers, refunds, and export execution

Contributing

Before opening a PR or making architecture-sensitive changes, read:

Additional repository governance files:

If you want to contribute code or docs, please also check the issue templates and PR template under .github/.

Architecture snapshot

The current implementation focuses on a minimum runnable end-to-end slice:

mock JKYun customer / order data
  -> raw object temporary storage
  -> PostgreSQL transaction writes raw_records + source_change_events + source_event_outbox
  -> raw object linked after transaction commit
  -> source_event_publisher sends Change Envelope to Redpanda
  -> customer / trade normalizer
  -> source_customers/source_trades + identity_map + canonical_customers/canonical_orders
  -> canonical_change_events
  -> customer query API with basic tenant / role checks

Detailed local runbook

For the full command reference and the longer local walkthrough, see:

The shortest practical local validation loop is:

start infrastructure

docker compose up -d postgres redpanda

start the application

mvn spring-boot:run

verify the main endpoints

http://localhost:8080/admin
http://localhost:8080/.well-known/openid-configuration
http://localhost:8080/api/customers?limit=10

For Admin Console browser checks and responsive screenshots after the application is running:

npx playwright install chromium
node scripts/admin-console-verify.mjs

run tests

mvn test

if you want the full runnable data flow, use the detailed runbook to:

ingest mock customers and trades
normalize customer and order records
build local Search / Vector-like indexes
exercise Admin Console, MDM, replay, and backfill flows

For the most accurate statement of implemented boundaries, rely on:

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
api		api
docs		docs
mock-data		mock-data
scripts		scripts
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
download-failures.json		download-failures.json
download-summary.json		download-summary.json
methods-index.json		methods-index.json
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Data Foundation

Why it matters

What it provides

Product preview

Project principles

5-minute quick start

Requirements

Start infrastructure

Start the application

Open the main endpoints

Demo login

Current runnable data shape

Current capabilities

Project status

Who this is for

Who this is not for

Repository layout

Key docs

OSS reviewer checklist

Roadmap

Explicit non-goals for the current version

Contributing

Architecture snapshot

Detailed local runbook

Design and architecture docs

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Data Foundation

Why it matters

What it provides

Product preview

Project principles

5-minute quick start

Requirements

Start infrastructure

Start the application

Open the main endpoints

Demo login

Current runnable data shape

Current capabilities

Project status

Who this is for

Who this is not for

Repository layout

Key docs

OSS reviewer checklist

Roadmap

Explicit non-goals for the current version

Contributing

Architecture snapshot

Detailed local runbook

Design and architecture docs

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages