Query the in-memory graph with Cypher by paracycle · Pull Request #868 · Shopify/rubydex

paracycle · 2026-06-18T22:03:48Z

Goal

Give clients a flexible, future-proof way to query the in-memory graph using Cypher — the de facto standard graph query language. Instead of adding a bespoke method for every traversal, this exposes the graph through a query language clients already know, so new introspection needs become queries rather than new APIs. Queries are read-only, run directly against the existing in-memory Graph (no duplication, no embedded database).

References

openCypher specification: https://opencypher.org/resources/
Cypher query language (Neo4j docs): https://neo4j.com/docs/cypher-manual/current/

Architecture

The Cypher engine itself — lexer, recursive-descent parser, AST, the tree-walking executor, values, and result formatting — lives in a separate, published crate, cypher-parser. The executor is generic over a GraphProvider trait, so it has no dependency on rubydex.

rubydex depends on cypher-parser and provides only the rubydex-specific pieces:

query::cypher::schema — impl GraphProvider for Graph, the property-graph mapping.
query::cypher::schema_info — the static schema description.

How it's exposed

rubydex_cli: --query "<CYPHER>", --schema, --format table|json.
Ruby API — the whole query API lives on Rubydex::Query:
- Rubydex::Query.parse(str) → an opaque, reusable parsed query (raises ArgumentError on syntax errors, needs no graph).
- Rubydex::Query#render(graph, format = :table) → runs a parsed query against a graph and returns the formatted output (table or JSON).
- Rubydex::Query.schema(format = :table) → describes the queryable schema.

rdx command CLI:

rdx query <CYPHER> [--format table|json]
rdx schema         [--format table|json]
rdx console

Parse first, then build the graph

Both CLIs parse the query into the opaque parsed object before indexing/resolution, so a malformed query fails fast (~0.1s) instead of after a full workspace index:

parse query  ->  build graph (index + resolve)  ->  run parsed query

A parsed Rubydex::Query is reusable: parse once, run against many graphs.

Graph schema exposed to queries

rdx schema / rubydex_cli --schema / Rubydex::Query.schema print this model:

Node labels: Document, Definition, Declaration, the grouping label Namespace, and declaration kind sub-labels (Class, Module, SingletonClass, Method, Constant, ConstantAlias, GlobalVariable, InstanceVariable, ClassVariable).

Relationship types: DEFINES (Document→Definition), DECLARES (Definition→Declaration), CONTAINS (nesting), INHERITS (superclass), INCLUDES/PREPENDS/EXTENDS (mixins), OWNS (members), ANCESTOR, DESCENDANT, REFERENCES (Document→Declaration).

Properties: Declaration: name, unqualified_name, kind, visibility, definition_count; Definition: kind, name, file, line; Document: uri, path.

Supported syntax: MATCH (node patterns with label disjunction :A|B and inline properties; relationship patterns with direction, type lists, and variable length *min..max), WHERE (=, <>, <, <=, >, >=, CONTAINS, STARTS WITH, ENDS WITH, AND/OR/NOT), RETURN (DISTINCT, AS, aggregates count/collect/min/max/sum/avg), ORDER BY, SKIP, LIMIT. Read-only; write clauses are intentionally unsupported.

Try it

# Discover the model
rdx schema

# All classes or modules
rdx query "MATCH (n:Class|Module) RETURN n.name ORDER BY n.name"

# All (transitive) subclasses of a base class, as JSON
rdx query "MATCH (c:Class)-[:INHERITS*1..]->(p {name: 'ApplicationRecord'}) RETURN DISTINCT c.name" --format json

# Count definitions per file
rdx query "MATCH (d:Document)-[:DEFINES]->(def:Definition) RETURN d.path, count(def) AS defs ORDER BY defs DESC"

From Ruby:

query = Rubydex::Query.parse("MATCH (n:Class|Module) RETURN n.name")  # fails fast on bad syntax
graph = Rubydex::Graph.new
graph.index_workspace
graph.resolve
puts query.render(graph, :json)

Commits

Add read-only Cypher query engine over the in-memory graph
Add --query and --schema flags to rubydex_cli
Expose Cypher via Rubydex::Graph and the rdx command CLI
Extract the Cypher engine into the standalone cypher-parser crate
Parse Cypher queries into a reusable Query object before building the graph

Introduce a hand-written Cypher subset engine (lexer, recursive-descent parser, and tree-walking executor) that runs read-only queries directly against the in-memory Graph, with no external parser or database dependency and no graph duplication. The graph is exposed as a property graph: node labels (Document, Definition, Declaration plus kind sub-labels and the Namespace grouping) and relationship types (DEFINES, DECLARES, CONTAINS, INHERITS, INCLUDES, PREPENDS, EXTENDS, OWNS, ANCESTOR, DESCENDANT, REFERENCES) mirror the DOT exporter's schema. Supported syntax: MATCH with node patterns (label disjunction, inline properties), relationship patterns (directions, type lists, variable length), WHERE (comparisons, CONTAINS/STARTS WITH/ENDS WITH, AND/OR/NOT), RETURN with DISTINCT/aliases/aggregates, and ORDER BY/SKIP/LIMIT. Results render as a text table or JSON. A static description of the queryable schema (labels, relationship types, and properties) is also available via `cypher::schema`.

Wire the Cypher engine into the CLI with --query <CYPHER> to run a query and --schema to print the queryable schema (labels, relationships, properties). The output format is selected with --format <table|json> (default table). Queries run after resolution; --schema is static and exits before indexing. Parse and execution errors go to stderr with a non-zero exit. Add CLI integration tests for query output, schema output, and error handling.

Add FFI exports (rdx_graph_query and rdx_cypher_schema) in rubydex-sys, bind them as the Graph#query instance method and the Graph.cypher_schema class method, and add their Sorbet signatures. query accepts an optional format (String or Symbol, default :table) and raises ArgumentError on parse, execution, or format errors. Restructure the exe/rdx executable around subcommands: `rdx query <CYPHER>`, `rdx schema`, and `rdx console` (the interactive session), each with a --format option where applicable. Cover the Ruby API with tests for query output, schema output, format coercion, label disjunction, and error handling.

Move the entire Cypher engine — lexer, parser, AST, the tree-walking executor, values, and result formatting — out of rubydex and into the standalone, published `cypher-parser` crate (depended on from crates.io). The executor is generic over `cypher_parser::GraphProvider`, so rubydex only provides the rubydex-specific mapping by implementing that trait for `Graph` (in `query::cypher::schema`), plus the static `--schema` description (in `query::cypher::schema_info`). This separates the query language and its execution from the rubydex graph, letting the engine be versioned, tested, and reused independently. The executor's own tests live in the cypher-parser crate (against an in-memory provider); rubydex keeps end-to-end tests against a real Graph.

… graph Split query handling into an explicit parse step and a render step so a malformed query fails fast, before the expensive workspace indexing and resolution. - rubydex_cli: parse `--query` up front (exiting on a syntax error before any listing/indexing), then run the pre-parsed query against the graph via cypher::run_parsed. - Gem: add an opaque `Rubydex::Query` object: * `Rubydex::Query.parse(str)` parses without a graph, raising ArgumentError on a syntax error; * `Query#render(graph, format)` runs the parsed query against a graph and returns the formatted output; * `Rubydex::Query.schema(format)` describes the queryable schema. Backed by new FFI exports (rdx_cypher_parse, rdx_cypher_query_free, rdx_query_run). The query API now lives entirely on `Rubydex::Query`: the previous `Graph#query` and `Graph.cypher_schema` methods are removed. - exe/rdx: `query` parses first, then builds the graph, then renders the parsed query against it; `schema` uses `Rubydex::Query.schema`.

vinistock

Still trying to wrap my head around the entire engine, but left some comments already. Excited to have a unified way of querying the graph.

I wonder if there's some IRB trick we can use to enter a "query" mode that accepts the Cypher queries directly (non-valid Ruby). Something like:

bundle exec rdx -i
Indexing...
Resolving...
> graph["Foo"]
=> <Declaration ...>
>
> query_mode!
> MATCH (n:Class|Module) RETURN n.name ORDER BY n.name
=> [Foo]

vinistock · 2026-06-24T21:38:56Z

+end
+
+# Builds the workspace graph, sending progress messages to `progress_io`.
+def build_graph(progress_io)


Other than quick one off switches, like --version, I would assume everything in this executable always depends on a populated graph (like interactive mode or query).

What do you think of keeping the one off switches as early returns at the top, then we populate the graph and the different commands simply perform different operations on it?

Generally, that is correct, except for the schema subcommand handling, which returns a rendering of the schema (with documentation) without needing to index anything.

I could turn schema subcommand into a --schema flag for the query subcommand, if you want, and that would then become special handling for that subcommand, and we can do what you are suggesting.

Let me know what you prefer.

vinistock · 2026-06-24T22:16:38Z

+/// # Errors
+///
+/// Returns a [`CypherError`] if the query cannot be executed.
+pub fn run_parsed(graph: &Graph, query: &Query, output_format: OutputFormat) -> Result<String, CypherError> {


Is there a scenario where a consumer would use this? Is it for caching the parsed query somewhere and then skipping the parse step?

So, this is actually the main way that we are using the API for 2 main reasons:

It makes sure that if there are any query parsing errors, we can catch those early at query parse time, which we can do before we index the codebase. Then we pass the parsed query to this method to run it against a graph.

We can also use this to cache queries and run them multiple times against the same graph multiple times.

My motivation for the split was mainly for 1, but 2 comes as a nice by-product.

At this point, the complementary run_query (which takes a string query) is in the PR as a utility method and is only used in tests. We can remove that version, if you want.

vinistock · 2026-06-24T22:19:40Z

+pub enum RelType {
+    Defines,
+    Declares,
+    Contains,


What does contains represent?

CONTAINS represents lexical nesting between definitions, so it is a Definition to Definition edge. For example, a class written textually inside a module in the same file. It's the source-level counterpart of OWNS, which is declaration-level membership merged across all files.

I will be documenting this directly on the type so that every RelType variant has a doc comment spelling out its source/target node type and meaning.

vinistock · 2026-06-24T22:23:56Z

+}
+
+/// Walks constant-alias chains until reaching a namespace declaration.
+fn resolve_to_namespace(graph: &Graph, declaration_id: DeclarationId) -> Option<DeclarationId> {


This method already exists in query.rs (although it may not be handling the circular alias case).

Can we make that one public instead?

Agree there's duplication worth removing, but they're not quite the same and I want to unify carefully rather than just making the existing one public.

query.rs::resolve_to_namespace returns Result<Option<…>> (it errors when a declaration is neither a namespace nor an alias-to-namespace) and does a single resolve_alias step. This method returns Option<…> and walks alias chains in a loop to handle cyclic aliases, which is the case you are also noting.

We could extract a shared helper that keeps the cyclic-alias handling, and then have query.rs adopt it as well. But, in this PR, I want to stay away from code that is behaviour changing in the core.

Follow the modern Rust module convention (path.rs alongside a path/ directory) instead of the legacy path/mod.rs style. Pure file move; the cypher/ directory keeps the schema, schema_info, and tests submodules.

CONTAINS is per-file lexical nesting (Definition -> Definition), e.g. a class written inside a module; OWNS is the declaration-level membership counterpart, merged across all files. Add per-variant doc comments to RelType and clarify both in the module-level schema docs.

paracycle · 2026-06-25T20:13:21Z

I wonder if there's some IRB trick we can use to enter a "query" mode that accepts the Cypher queries directly (non-valid Ruby). Something like:
bundle exec rdx -i
Indexing...
Resolving...
> graph["Foo"]
=> <Declaration ...>
>
> query_mode!
> MATCH (n:Class|Module) RETURN n.name ORDER BY n.name
=> [Foo]

Great idea. I am prototyping what's possible here, but obviously it won't be a part of this PR.

paracycle · 2026-06-25T20:26:27Z

Ok, this PR has a prototype for the console mode extension: #883

Previously the Document `path` property returned the URI basename, making it identical to a name and mislabeled. Split them: - `uri` -> full document URI (e.g. file:///app/models/user.rb) - `path` -> file-system path (e.g. /app/models/user.rb) - `name` -> base file name (e.g. user.rb) Add `Document::file_path` / `Document::file_name`, which decode the URI via the `url` crate (already a dependency) so percent-encoding and platform paths (including Windows drive paths) are handled correctly instead of naively splitting on '/'. `require_path` now reuses `file_path` instead of re-parsing the URI. Non-file:// URIs (the synthetic built-in document) fall back to the raw URI. Clarify that `prop` is the property name read off a node, and advertise the new `name` property in the schema.

paracycle · 2026-06-25T20:44:52Z

@vinistock If it is helpful, I had this diagram generated from the codebase for how the query engine works at a high level:

paracycle requested a review from a team as a code owner June 18, 2026 22:03

paracycle mentioned this pull request Jun 18, 2026

Add opt-in client/server mode to the rdx executable #869

Open

paracycle force-pushed the uk_add_cypher_query_engine branch from 0e33938 to 49ed15d Compare June 19, 2026 00:16

paracycle added 3 commits June 19, 2026 03:21

paracycle force-pushed the uk_add_cypher_query_engine branch 2 times, most recently from abfcb24 to aca5184 Compare June 23, 2026 18:46

paracycle force-pushed the uk_add_cypher_query_engine branch 2 times, most recently from cb17dab to 2e6a202 Compare June 23, 2026 19:42

paracycle mentioned this pull request Jun 23, 2026

Return Cypher query results as graph objects #873

Open

paracycle force-pushed the uk_add_cypher_query_engine branch from 2e6a202 to bc2a231 Compare June 23, 2026 21:16

vinistock reviewed Jun 24, 2026

View reviewed changes

paracycle added 2 commits June 25, 2026 22:20

Use cypher.rs instead of cypher/mod.rs module file

1fc8598

Follow the modern Rust module convention (path.rs alongside a path/ directory) instead of the legacy path/mod.rs style. Pure file move; the cypher/ directory keeps the schema, schema_info, and tests submodules.

paracycle requested a review from vinistock June 25, 2026 20:09

paracycle mentioned this pull request Jun 25, 2026

Run Cypher queries from the rdx console #883

Open

paracycle force-pushed the uk_add_cypher_query_engine branch from f6e2615 to 7e08a7b Compare June 25, 2026 20:31

Uh oh!

Conversation

paracycle commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goal

References

Architecture

How it's exposed

Parse first, then build the graph

Graph schema exposed to queries

Try it

Commits

Uh oh!

vinistock left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

paracycle commented Jun 25, 2026

Uh oh!

paracycle commented Jun 25, 2026

Uh oh!

paracycle commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

paracycle commented Jun 18, 2026 •

edited

Loading

paracycle commented Jun 25, 2026 •

edited

Loading