Skip to content

Latest commit

 

History

History
845 lines (635 loc) · 26.4 KB

File metadata and controls

845 lines (635 loc) · 26.4 KB

CodeGraphPlus Extension Guide

This document explains how to extend CodeGraphPlus: implement Indexer plugins, establish asset relationships, support project groups / multi-repository scenarios, and modify the scheduler layer after enabling real indexing.

Scope: The examples and scanning rules below target Java backend projects (Spring Boot, Maven/Gradle, Flyway, JPA, MyBatis, OpenFeign, etc.). Node / Python / Go stacks are outside the current Indexer design scope.


Table of Contents

  1. Extension model overview
  2. External Indexer scripts (recommended)
  3. Indexer plugin contract
  4. Writing assets and relations
  5. Extension: database tables (db-table)
  6. Extension: HTTP API (openapi)
  7. Extension: Redis Key (redis)
  8. Extension: message Topics (topic)
  9. Extension: thread pools (thread-pool)
  10. Extension: HTTP dependencies (http-dependency)
  11. Extension: cross-asset relations (cross-link)
  12. Extension: live database (live DB)
  13. Extension: adding a new Indexer plugin
  14. Extension: MCP tools
  15. Extension: cross-project / project group scenarios
  16. Enabling real indexing: scheduler changes
  17. Naming and data conventions
  18. Testing and validation
  19. Recommended implementation order
  20. Agent enrichment (Enrichment)

Extension model overview

target-project/
  .codegraphplus/
    codegraphplus.db          ← SQLite (assets + relations + files + metadata)
  (source code, config, migration files, etc.)

codegraphplus init -i
       │
       ▼
indexProject()  ──►  each IndexerPlugin
       │                  │
       │            discover()  scan candidate files
       │            indexFile()  parse and write
       ▼
QueryBuilder.upsertAsset / addRelation
       │
       ▼
MCP tools (search / explore / node / …)

Current state: All built-in plugins have stub indexFile implementations—they only track file hashes and do not write to assets. After implementing parsing logic, you must also modify the scheduler layer in src/indexer/index.ts (see Enabling real indexing).


Current state: Built-in plugin indexFile implementations are stubs. Teams can write real scanning logic as external TS/JS scripts in .codegraphplus/plugins/ without modifying CodeGraphPlus source code.


External Indexer scripts (recommended)

Directory structure

your-service/
  .codegraphplus/
    indexers.config.json     ← one script path per scan slot
    plugins/
      openapi.ts
      db-table.ts            ← database table scanning
      redis.ts
      topic.ts
      thread-pool.ts
      http-dependency.ts
    codegraphplus.db

During init, the above config, TS template files for each slot, and the plugins/ directory are created automatically.

indexers.config.json (one script per slot)

{
  "openapi": "plugins/openapi.ts",
  "db-table": "plugins/db-table.ts",
  "redis": null,
  "topic": null,
  "thread-pool": "plugins/thread-pool.ts",
  "http-dependency": null
}
  • Each key is a fixed scan slot; each value is the external script path for that slot (relative to .codegraphplus/), or null to use the CodeGraphPlus built-in stub.
  • When a path is specified, the corresponding built-in plugin no longer participates in indexing; the script must implement discover + indexFile.
Slot key Default script Output kind
openapi plugins/openapi.ts http_api, dto_schema
db-table plugins/db-table.ts db_table
redis plugins/redis.ts redis_key
topic plugins/topic.ts topic
thread-pool plugins/thread-pool.ts thread_pool
http-dependency plugins/http-dependency.ts http_dependency

Script template (TypeScript)

// .codegraphplus/plugins/db-table.ts
import * as fs from 'fs';
import * as path from 'path';
import type { IndexerPlugin } from 'codegraphplus/plugin-api';

const plugin: IndexerPlugin = {
  name: 'db-table',
  discover(projectRoot: string): string[] {
    const dir = path.join(projectRoot, 'src/main/resources/db/migration');
    if (!fs.existsSync(dir)) return [];
    return fs.readdirSync(dir)
      .filter((f) => f.endsWith('.sql'))
      .map((f) => path.join(dir, f));
  },
  indexFile(queries, filePath, projectRoot) {
    const rel = path.relative(projectRoot, filePath).replace(/\\/g, '/');
    queries.clearSourceFile(rel);
    queries.upsertAsset({
      kind: 'db_table',
      name: 'orders',
      qualifiedName: 'demo.public.orders',
      sourceFile: rel,
      definitionJson: '{}',
      metadataJson: JSON.stringify({ parser: 'db-table' }),
    });
    return { assetsAdded: 1, relationsAdded: 0, skipped: false };
  },
};

export default plugin;

JavaScript equivalent: module.exports = { name, discover, indexFile }.

Loading and CLI

  • codegraphplus index: loads per slot from config; null → built-in, path → external script.
  • codegraphplus plugins [path]: table showing slot → source → script.
  • .ts files are transpiled at runtime (requires typescript); you can precompile to .js.

Comparison with modifying built-in plugins

Approach Location Best for
External scripts .codegraphplus/plugins/ Single project / team-private rules, versioned with the business repo (gitignore or commit)
Modify built-in src/indexer/plugins/ Upstream contribution, company-wide unified toolchain

Indexer plugin contract

Defined in src/indexer/index.ts:

export interface IndexerPlugin {
  /** Plugin name; stored in files.plugin for codegraphplus_files filtering */
  name: string;
  /** Discover candidate files under project root (absolute paths) */
  discover(root: string): string[];
  /** Parse a single file; write assets / relations */
  indexFile(
    queries: QueryBuilder,
    filePath: string,
    projectRoot: string,
  ): IndexerResult;
}

export interface IndexerResult {
  assetsAdded: number;
  relationsAdded: number;
  skipped: boolean;   // true = this file produced no new assets
}

Built-in plugin registry (same file, PLUGINS array):

Plugin File Target kind
openapi plugins/openapi.ts http_api, dto_schema
db-table plugins/db-table.ts db_table
redis plugins/redis.ts redis_key
topic plugins/topic.ts topic
thread-pool plugins/thread-pool.ts thread_pool
http-dependency plugins/http-dependency.ts http_dependency

Writing assets and relations

Assets

Written via QueryBuilder.upsertAsset:

const id = queries.upsertAsset({
  kind: 'db_table',                    // AssetKind
  name: 'orders',                      // short name
  qualifiedName: 'order-service.public.orders',  // globally unique (within same kind + source)
  sourceFile: 'services/order/migrations/V1__orders.sql',  // relative to project root
  definitionJson: JSON.stringify({ columns: [...] }),
  metadataJson: JSON.stringify({ summary: '12 columns', service: 'order-service' }),
});

Unique constraint: (kind, qualified_name, source_file). Before re-scanning the same file, call queries.clearSourceFile(relPath) to remove old assets and edges produced by that file.

Relations

Written via queries.addRelation(relation, fromId, toId):

relation Meaning Example
uses_table API / job reads or writes a table http_apidb_table
uses_redis Uses a Redis Key http_apiredis_key
publishes Publishes to a Topic http_apitopic
subscribes Subscribes to a Topic http_apitopic
uses_schema Uses a DTO http_apidto_schema
calls_http Calls another HTTP endpoint http_apihttp_dependency
runs_on_pool Uses a thread pool http_apithread_pool
references Generic reference dto_schemadto_schema

Type definitions are in src/types.ts.


Extension: database tables (db-table)

Scenario: Scan database tables in Java projects (Flyway/Liquibase migrations, JPA Entity, MyBatis XML, DDL exports).

Entry point: src/indexer/plugins/db-table.ts

1. Enhance discover

Add path rules for monorepo / multi-service layouts:

const PATTERNS = [
  /migrations?\//i,
  /flyway/i,
  /liquibase/i,
  /db\/changelog/i,
  /\.sql$/i,
  /V\d+__.*\.sql$/i,              // Flyway: V1__create_orders.sql
];

function isDbFile(filePath: string): boolean {
  const rel = filePath.replace(/\\/g, '/');
  if (PATTERNS.some((p) => p.test(rel))) return true;
  // JPA Entity
  if (/\.java$/i.test(rel) && /entity|model|domain/i.test(rel)) return true;
  // MyBatis mapper XML
  if (/\.xml$/i.test(rel) && /mapper/i.test(rel)) return true;
  return false;
}

2. Implement indexFile

Recommended: split into standalone parser modules:

src/indexer/
  parsers/
    sql-create-table.ts    # CREATE TABLE statements
    flyway.ts
    jpa-entity.ts          # @Entity / @Table
    mybatis-mapper.ts      # table names in XML
  utils/
    infer-service.ts       # infer service name from path

Example skeleton:

indexFile(queries, filePath, projectRoot): IndexerResult {
  const rel = path.relative(projectRoot, filePath).replace(/\\/g, '/');
  const service = inferServiceName(rel);  // e.g. order-service

  queries.clearSourceFile(rel);

  const tables = dispatchParse(filePath);  // select parser by extension
  let assetsAdded = 0;

  for (const table of tables) {
    queries.upsertAsset({
      kind: 'db_table',
      name: table.name,
      qualifiedName: `${service}.${table.schema ?? 'public'}.${table.name}`,
      sourceFile: rel,
      definitionJson: JSON.stringify({
        columns: table.columns,
        primaryKey: table.primaryKey,
        indexes: table.indexes,
      }),
      metadataJson: JSON.stringify({
        summary: `${table.columns.length} columns`,
        service,
        dialect: table.dialect,
      }),
    });
    assetsAdded++;
  }

  return { assetsAdded, relationsAdded: 0, skipped: assetsAdded === 0 };
}

3. Minimal SQL migration parser example

function parseCreateTables(sql: string): ParsedTable[] {
  const tables: ParsedTable[] = [];
  const re = /CREATE\s+TABLE\s+(?:IF\s+NOT\s+EXISTS\s+)?[`"']?(\w+)[`"']?\s*\(([\s\S]*?)\)\s*;/gi;
  let m;
  while ((m = re.exec(sql)) !== null) {
    tables.push({ name: m[1], columns: parseColumns(m[2]) });
  }
  return tables;
}

For production, replace with a library such as node-sql-parser.

4. Link to APIs

During the openapi or cross-link phase:

queries.addRelation('uses_table', httpApiAssetId, dbTableAssetId);

Inference methods: co-occurrence in the same module, MyBatis XML, @ManyToOne, Repository injection, etc.


Extension: HTTP API (openapi)

Entry point: src/indexer/plugins/openapi.ts

discover (existing)

Scans openapi.yaml, swagger.json, docs/api/, etc.

indexFile implementation notes

  1. Parse YAML/JSON (use yaml + JSON.parse)
  2. Iterate paths.*.{get,post,put,patch,delete}kind: http_api
    • qualifiedName: GET /users/{id} or {specTitle}::listUsers
    • metadataJson: { summary, tags, operationId }
  3. Iterate components.schemaskind: dto_schema
  4. Extract uses_schema edges from $ref
// operation → schema
queries.addRelation('uses_schema', endpointId, schemaId);

Code routes (when no OpenAPI)

Add plugins/spring-routes.ts to scan Spring MVC annotations:

Source Scan target Parsing approach
Spring MVC @GetMapping, @PostMapping, @RequestMapping regex or Java AST
Spring WebFlux @GetMapping, etc. (RouterFunction optional extension) regex

Register in PLUGINS.


Extension: Redis Key (redis)

Entry point: src/indexer/plugins/redis.ts

Scan sources

  • Redis config sections in application.yml / application.properties
  • Key constants in code: RedisKey.USER_SESSION = "session:{userId}"
  • Key conventions in comments or documentation

Write example

queries.upsertAsset({
  kind: 'redis_key',
  name: 'user-session',
  qualifiedName: 'order-service:session:{userId}',
  sourceFile: rel,
  definitionJson: JSON.stringify({ pattern: 'session:{userId}', ttl: 3600 }),
  metadataJson: JSON.stringify({ summary: 'User session cache' }),
});

Relations

queries.addRelation('uses_redis', httpApiId, redisKeyId);

Extension: message Topics (topic)

Entry point: src/indexer/plugins/topic.ts

Scan sources

  • Kafka: @KafkaListener, ProducerRecord, spring.kafka in application.yml
  • RabbitMQ: @RabbitListener, Queue declarations
  • Redis Stream: XADD / stream key configuration

Write example

queries.upsertAsset({
  kind: 'topic',
  name: 'order.created',
  qualifiedName: 'order-service:order.created',
  sourceFile: rel,
  definitionJson: JSON.stringify({ broker: 'kafka', direction: 'publish' }),
  metadataJson: JSON.stringify({ summary: 'Order created event' }),
});

Relations

queries.addRelation('publishes', httpApiId, topicId);
queries.addRelation('subscribes', consumerApiId, topicId);

Extension: thread pools (thread-pool)

Entry point: src/indexer/plugins/thread-pool.ts

Scan sources

  • Spring: @EnableAsync, ThreadPoolTaskExecutor Bean definitions
  • Java: ExecutorService, Executors.newFixedThreadPool
  • Config: async.executor.core-pool-size, etc.

Write example

queries.upsertAsset({
  kind: 'thread_pool',
  name: 'asyncExecutor',
  qualifiedName: 'order-service:asyncExecutor',
  sourceFile: rel,
  definitionJson: JSON.stringify({ corePoolSize: 8, maxPoolSize: 16, queueCapacity: 100 }),
  metadataJson: JSON.stringify({ summary: 'Async task executor' }),
});

Relations

queries.addRelation('runs_on_pool', httpApiId, poolId);

Extension: HTTP dependencies (http-dependency)

Entry point: src/indexer/plugins/http-dependency.ts

Scan sources

  • OpenFeign: @FeignClient(name = "payment-service")
  • RestTemplate / WebClient call URLs
  • External service baseURLs declared in application.yml

Write example

queries.upsertAsset({
  kind: 'http_dependency',
  name: 'charge',
  qualifiedName: 'order-service→payment-service:POST /charge',
  sourceFile: rel,
  definitionJson: JSON.stringify({
    targetService: 'payment-service',
    method: 'POST',
    path: '/charge',
  }),
  metadataJson: JSON.stringify({ client: 'feign' }),
});

Relations

queries.addRelation('calls_http', callerHttpApiId, dependencyId);

Extension: cross-asset relations (cross-link)

A single-file parser often can only build in-file relationships. For cross-file relationships, add a second-phase indexer:

src/indexer/
  cross-link/
    api-to-table.ts      # API handler ↔ Entity / MyBatis
    api-to-topic.ts
    api-to-redis.ts

Run at the end of indexProject, after all plugins finish:

for (const linker of CROSS_LINKERS) {
  linker.link(queries, projectRoot);
}

Inside link(), read already-written assets, match by rules, and call addRelation—no source file parsing.


Extension: live database (live DB)

For scenarios where a project group scans databases centrally and source code lacks complete DDL. Skips file discover; reads project configuration instead.

Configuration example

Create .codegraphplus/config.json in the target project (do not commit versions with passwords; add to .gitignore or use environment variable placeholders):

{
  "databases": [
    {
      "name": "order-db",
      "service": "order-service",
      "url": "${ORDER_DB_URL}",
      "schemas": ["public"]
    }
  ]
}

Implementation approach

  1. Add plugins/db-live.ts or extend db-table.ts:
    • discover() returns []
    • Add indexLive(queries, config) called separately from indexProject
  2. Connect to the database and query information_schema / pg_catalog
  3. Write db_table assets with sourceFile set to __live__:order-db
qualifiedName: `${service}.${schema}.${tableName}`

Notes

  • Do not write DSNs into codegraphplus.db
  • Live DB indexer bypasses file hash cache; each codegraphplus index fully refreshes live assets

Extension: adding a new Indexer plugin

  1. Create my-plugin.ts in src/indexer/plugins/
  2. Implement the IndexerPlugin interface
  3. Register in the PLUGINS array in src/indexer/index.ts
  4. If introducing a new AssetKind, also update:

Extension: MCP tools

The existing 8 tools are implemented in the ToolHandler in src/mcp/tools.ts. Steps to add a new tool:

  1. Add a ToolDefinition to the tools array
  2. Add a case in execute()
  3. Implement the handler method
  4. Update server-instructions.ts

New tools do not require changes to transport / session unless supporting new capabilities such as MCP Resources.

Cross-project queries (existing)

All tools support an optional projectPath parameter:

{
  "name": "codegraphplus_search",
  "arguments": {
    "query": "orders",
    "kind": "db_table",
    "projectPath": "C:/projects/order-service"
  }
}

ToolHandler.resolveGraph() walks up from that path to find .codegraphplus/ and opens an independent DB.


Extension: cross-project / project group scenarios

Option A: Single init at monorepo root

cd /path/to/monorepo
codegraphplus init -i
  • discover recurses from root, covering services/*, apps/*
  • qualifiedName must include a service prefix: order-service.public.orders
  • Use utils/infer-service.ts to infer service name from path

Option B: Separate init per repository

cd service-a && codegraphplus init -i
cd service-b && codegraphplus init -i

MCP queries each repository's .codegraphplus/ separately via projectPath.

Option C: CodeGraphPlus repo holds a "project registry"

Extend .codegraphplus/config.json:

{
  "projects": [
    { "name": "order-service", "path": "../order-service" },
    { "name": "payment-service", "path": "../payment-service" }
  ]
}

Add an MCP tool codegraphplus_projects or list the registry in codegraphplus_status (implement yourself).

Option D: Live multi-database connection (see Live database)

One init point, multiple DSNs configured, all written to the same codegraphplus.db.


Enabling real indexing: scheduler changes

The current indexProject is a stub implementation. After plugins are implemented, modify:

let assetsAdded = 0;
let relationsAdded = 0;

// After plugin.indexFile call:
const result = plugin.indexFile(queries, file, projectRoot);
assetsAdded += result.assetsAdded;
relationsAdded += result.relationsAdded;

// At end of indexing:
const hasAssets = assetsAdded > 0 || queries.getStats().totalAssets > 0;
queries.setMetadata('index_status', hasAssets ? 'ready' : 'stub');

return {
  filesIndexed,
  assetsAdded,
  relationsAdded,
  durationMs: Date.now() - start,
  stub: !hasAssets,
};

Unchanged files are still skipped via hash (existing logic); changed files call clearSourceFile first, then rewrite (called inside indexFile).


Naming and data conventions

Recommended qualified_name formats

kind Format Example
http_api {method} {path} or {service}::{operationId} GET /users/{id}
db_table {service}.{schema}.{table} order-service.public.orders
redis_key {service}:{pattern} order-service:session:{userId}
topic {service}:{topic} order-service:order.created
dto_schema {service}::{SchemaName} order-service::CreateOrderRequest
http_dependency {caller}→{target}:{method} {path} order→payment:POST /charge
thread_pool {service}:{poolName} order-service:asyncExecutor

definition_json vs metadata_json

  • definition_json: Structured definition (columns, parameters, schema body), displayed by codegraphplus_node
  • metadata_json: Search and summary fields (summary, tags, service), matched by codegraphplus_search

Testing and validation

Unit tests

In __tests__/indexer/, add fixture files and assertions for each parser:

it('parses CREATE TABLE from migration', () => {
  const tables = parseCreateTables(fs.readFileSync('fixtures/V1__orders.sql', 'utf-8'));
  expect(tables[0].name).toBe('orders');
});

Integration tests

codegraphplus init -i --path /path/to/fixture-project
codegraphplus status --path /path/to/fixture-project   # db_table > 0, indexStatus: ready
codegraphplus search orders --path /path/to/fixture-project

MCP tests

Refer to the spawn + JSON-RPC handshake in __tests__/indexer.test.ts; assert tools/call response content.


Recommended implementation order

  1. db-table — SQL migrations + one ORM (most relevant to your business)
  2. openapi — HTTP API + dto_schema + uses_schema
  3. cross-link — api → table
  4. http-dependency — inter-service call chains
  5. topic / redis / thread-pool — prioritize by project tech stack
  6. db-live (optional) — config-driven live connection

After completing each plugin, update the indexProject scheduler and verify with codegraphplus status + MCP codegraphplus_search for incremental delivery.


Agent enrichment (Enrichment)

On top of the Indexer baseline, an OpenCode / Cursor Agent reads source code and writes understanding back to the graph via MCP. CodeGraphPlus only exports the baseline and persists structured patches—it does not run an LLM.

Flow

sequenceDiagram
  participant Agent
  participant MCP as CodeGraphPlus_MCP
  participant DB as codegraphplus.db
  Agent->>MCP: codegraphplus_traverse kind=db_table
  MCP->>DB: read assets/files
  MCP-->>Agent: displayLabel + sourceFile + nextOffset
  Agent->>Agent: Read/Grep source code
  Agent->>MCP: codegraphplus_apply_updates dryRun
  Agent->>MCP: codegraphplus_apply_updates
  MCP->>DB: upsert assets/relations
Loading

MCP tools

Tool Description
codegraphplus_traverse Paginated traversal by kind; returns displayLabel (table name, API URL, etc.), sourceFile; returns candidate files when no data
codegraphplus_baseline One-shot overview: existing assets, tracked files without assets, per-kind statistics
codegraphplus_apply_updates Submit assets / relations patch; supports dryRun

Traversal example (codegraphplus_traverse)

{
  "kind": "db_table",
  "offset": 0,
  "limit": 20
}

The response text includes displayLabel, sourceFile, hasMore, nextOffset. When no assets exist for that kind, it lists candidate migration/entity files tracked by the Indexer for the Agent to read.

Patch format (codegraphplus_apply_updates)

{
  "dryRun": false,
  "assets": [
    {
      "kind": "db_table",
      "name": "orders",
      "qualifiedName": "order-service.public.orders",
      "sourceFile": "migrations/V1__orders.sql",
      "definition": { "columns": [{ "name": "id", "type": "bigint" }] },
      "metadata": { "summary": "Order records" }
    }
  ],
  "relations": [
    {
      "relation": "uses_table",
      "from": { "kind": "http_api", "qualifiedName": "GET /orders", "sourceFile": "openapi.yaml" },
      "to": { "kind": "db_table", "qualifiedName": "order-service.public.orders", "sourceFile": "migrations/V1__orders.sql" }
    }
  ]
}
  • action: "delete" can remove a specified asset
  • sourceFile must be a relative path and must not contain ..
  • metadata is merged with existing Indexer data; automatically writes enrichedBy: "agent"
  • After successful write, index_statusready

CLI

codegraphplus traverse --kind db_table --path /path/to/project
codegraphplus baseline --path /path/to/project
codegraphplus apply --file patches.json --path /path/to/project --dry-run

Code locations

Division of labor with Indexer

Phase Executor Output
codegraphplus index Indexer plugins baseline assets (optional)
Agent reads code OpenCode understanding (not inside CodeGraphPlus)
codegraphplus_traverse Agent via MCP items to verify, by kind
codegraphplus_apply_updates Agent via MCP enrich/correct assets & relations

The Agent should use real source paths as sourceFile to align with Indexer results; when the Indexer re-runs clearSourceFile, it will not mistakenly delete Agent assets under other paths.