Skip to content

feat(langservice): LSP-shaped hover (Markdown) + new signature-help endpoint#44

Open
joewiz wants to merge 5 commits into
eXist-db:developfrom
joewiz:feat/hover-markdown-and-signature-help
Open

feat(langservice): LSP-shaped hover (Markdown) + new signature-help endpoint#44
joewiz wants to merge 5 commits into
eXist-db:developfrom
joewiz:feat/hover-markdown-and-signature-help

Conversation

@joewiz

@joewiz joewiz commented Jun 5, 2026

Copy link
Copy Markdown
Member

[This PR was co-authored with Claude Code. -Joe]

Brings /api/langservice/hover in line with LSP's letter and spirit, and adds a sibling /api/langservice/signature-help endpoint. The goal — backed by a side-by-side analysis of eXide's F1 Function Documentation panel (which builds its rich Parameters / Returns / Template layout from a separate /api/editor/completions endpoint, not from hover) — is to give every client of existdb-openapi the same structured material the eXide F1 panel renders, via LSP-correct channels.

POST /api/langservice/hover — shape change

Breaking change. Before:

{ "contents": "fn:count($arg as item()*) as xs:integer\n\nReturns…", "kind": "function" }

After (matches LSP Hover + MarkupContent):

{
  "contents": {
    "kind": "markdown",
    "value": "```xquery\nfn:count($arg as item()*) as xs:integer\n```\n\nReturns the number of items in the input sequence.\n\n**Parameters**\n\n- `$arg` (`item()*`) — The input sequence\n\n**Returns:** `xs:integer`"
  }
}

The top-level kind field is removed (it was a non-LSP extension). The Markdown body includes a fenced XQuery code block for the signature, the description, a **Parameters** bullet list with per-parameter types and docs, and a **Returns:** line.

POST /api/langservice/signature-help — new endpoint

Matches LSP SignatureHelp exactly:

{
  "signatures": [
    {
      "label": "substring($source as xs:string?, $starting-at as xs:double, $length as xs:double) as xs:string",
      "documentation": { "kind": "markdown", "value": "Returns the portion of …" },
      "parameters": [
        { "label": "$source as xs:string?",     "documentation": { "kind": "markdown", "value": "`$source as xs:string?`\n\nThe source string" } },
        { "label": "$starting-at as xs:double", "documentation": { "kind": "markdown", "value": "`$starting-at as xs:double`\n\nThe starting position" } },
        { "label": "$length as xs:double",      "documentation": { "kind": "markdown", "value": "`$length as xs:double`\n\nThe number of characters …" } }
      ]
    }
  ],
  "activeSignature": 0,
  "activeParameter": 1
}

activeParameter is computed by scanning back from the cursor to the enclosing call's opening paren and counting commas at the call's paren depth. String literals (single + double, including doubled-quote escapes) are skipped so commas inside strings don't shift the index. Returns null if the cursor isn't inside a function call's argument list.

Why this shape (LSP letter and spirit)

LSP intentionally splits "what is this symbol?" across three messages:

  • textDocument/hover — narrative, Markdown, designed for tooltips.
  • textDocument/signatureHelp — structured per-parameter, designed for "I'm typing inside a call."
  • textDocument/completion + completion-item-resolve — structured per-symbol, designed for "I'm picking from a list."

The per-parameter <dl> you see in eXide's F1 panel is signature-help territory in LSP terms, not hover. The "Template" / snippet is a completion-item concern (existdb-openapi's completions already carry insertText with insertTextFormat: 2 as of #42). So:

  • Hover gets richer the LSP way (Markdown contents) — Option B from the design discussion. ✓ this PR
  • Signature-help becomes its own endpoint — exactly where LSP routes per-parameter detail. ✓ this PR
  • Completion-item snippets for function args — follow-up PR on top of feat(langservice): scope completions to trailing prefix + LSP-shaped output (#31) #42 once that merges (the function items currently use plain name() insertText; the small change is to switch to name(${1:$arg1}, ${2:$arg2}) so accepting drops the user into tab stops, matching eXide's "Template" line).

Implementation

  • MarkdownFormatter (new) — shared helper. Builds Markdown for hover and per-parameter Markdown for signature-help. Extracts param names and docs from FunctionParameterSequenceType (and return-type docs from FunctionReturnSequenceType); both are available on the FunctionSignature Java API and carry the docs that built-ins register via FunctionDSL.param(…, "description").
  • Hover.javabuildFunctionHover / buildVariableHover delegate to MarkdownFormatter and wrap the result in a MarkupContent map ({kind: "markdown", value: ...}).
  • SignatureHelp.java (new) — Java function lang:signature-help, registered in LangServiceModule. Uses a position-finding visitor modelled on Hover.NodeAtPositionFinder but specific to function calls (built-in + user-defined). Active-parameter logic is its own helper with the string-skip / paren-depth scanner factored to keep PMD's NPathComplexity under threshold.
  • modules/langservice.xqm — REST wrapper.
  • modules/api.json — hover response schema updated; new signature-help operation added with full request/response schemas and examples.

Verified

  • All 12 langservice cypress tests pass (3 new for signature-help; the existing hover test rewritten to assert the new MarkupContent shape, fenced code block, Parameters and Returns sections).
  • PMD clean on all changed Java files.
  • Smoke-tested manually: count((1,2,3)) hover renders the full Markdown with Parameters + Returns; substring("abc", 2, 1) signature-help at col 17 returns activeParameter: 1 (cursor past the second comma); 1 + 2 signature-help returns null.

Cross-client coordination

This is a breaking change for hover consumers. As Joe noted, existdb-openapi is still early in its life (only out in eXist 7 beta), so coordinated client updates are tractable. I'll file follow-up issues / PRs on:

  • eXidesrc/lang-hover.js:107 currently does data.contents.split("\n\n") on the old flat string; needs to read data.contents.value (and ideally render Markdown so users see the full Parameters/Returns sections in the hover tooltip itself, not just in F1).
  • existdb-oxygen-plugin — current plain-text hover display can render the new Markdown body directly. The hover-strategy session is the natural place for this; this PR's PR description includes the new shape they need.
  • existdb-langserver / vscode-existdb / notebook — heads-up issue on each to note the shape change before they pick up the next existdb-openapi release.

Want me to file those issues now (one per repo, with the same shape-change summary), or wait until this PR lands?

Open follow-ups (not in this PR)

joewiz added 2 commits June 5, 2026 16:01
Brings the hover endpoint and a new signature-help endpoint in line with
LSP's letter and spirit, so client UIs (eXide, Oxygen plugin, vscode,
notebook) can render rich function docs without re-fetching from
secondary endpoints.

**`POST /api/langservice/hover`** — breaking change to the response
shape. Previously:

    { "contents": "<flat text>", "kind": "function" }

Now (matches LSP `Hover` + `MarkupContent`):

    {
      "contents": {
        "kind": "markdown",
        "value": "```xquery\nfn:count($arg as item()*) as xs:integer\n```\n\nReturns…\n\n**Parameters**\n\n- `$arg` (`item()*`) — The input sequence\n\n**Returns:** `xs:integer`"
      }
    }

The top-level `kind` field is removed (it was a non-LSP extension).
Markdown body includes a fenced XQuery code block for the signature,
the function description, a per-parameter bullet list with types and
docs, and a Returns line.

**`POST /api/langservice/signature-help`** — new endpoint, matches LSP
`SignatureHelp` exactly: `signatures: SignatureInformation[]`,
`activeSignature`, `activeParameter`. Each `SignatureInformation` has
`label`, `documentation` (MarkupContent), and `parameters[]` with
per-parameter `label` and `documentation`. The active parameter is
computed by scanning back from the cursor to the call's opening paren
and counting commas at the call's paren depth, skipping string literals.

**Implementation:**

- `MarkdownFormatter` (new) — shared helper that builds signature
  Markdown for hover and per-parameter Markdown for signature-help,
  extracting param names and docs from `FunctionParameterSequenceType`.
- `Hover.java` — `buildFunctionHover` / `buildVariableHover` now
  delegate to `MarkdownFormatter` and wrap the result in a
  MarkupContent map.
- `SignatureHelp.java` (new) — Java function `lang:signature-help`,
  registered in `LangServiceModule`. Uses a position-finding visitor
  modelled on `Hover.NodeAtPositionFinder` but specific to function
  calls (built-in + user-defined).
- `modules/langservice.xqm` — REST wrapper for the new function.
- `modules/api.json` — updated hover response schema; added new
  signature-help operation with full request/response schema.
- Cypress: hover tests verify the new shape; 3 new tests cover
  signature-help happy path, activeParameter computation, and
  "cursor not in a call" null response.

Client coordination note (not in this PR): eXide's `src/lang-hover.js`
and any other consumer that calls `data.contents.split("\n\n")` on the
old flat-string shape will need to read `data.contents.value` instead.
Filed as a follow-up.

Refs eXide F1 Function Documentation panel (`src/funcdoc-tooltip.js`)
which produces the rich layout this work makes available to all
existdb-openapi clients.
Cardinality.EMPTY_SEQUENCE.toXQueryCardinalityString() returns the
literal "empty-sequence()" rather than a postfix marker like ?/*/+, so
the naive `name + card` concat in MarkdownFormatter.formatType()
produced garbled output for functions returning empty-sequence():

- util:log  → **Returns:** `empty-sequence()empty-sequence()`
- util:wait → **Returns:** `item()empty-sequence()`

Detect that cardinality and emit "empty-sequence()" standalone instead.

Cypress: regression test on util:log's hover; also asserts the two
garbled forms never appear.

Spotted by the oxygen-plugin team adopting the new hover shape; thanks!
@joewiz

joewiz commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

[This response was co-authored with Claude Code. -Joe]

The root cause of the issue just fixed in 2446189: Cardinality.EMPTY_SEQUENCE.toXQueryCardinalityString() returns the literal "empty-sequence()" rather than a postfix marker like ?/*/+, so MarkdownFormatter.formatType()'s naive name + card concat produced "item()empty-sequence()" / "empty-sequence()empty-sequence()" for functions declared as empty-sequence().

Now detects the empty-sequence cardinality and emits "empty-sequence()" standalone. Verified against three cases:

  • util:log**Returns:** `empty-sequence()`
  • util:wait**Returns:** `empty-sequence()` — Returns an empty sequence
  • xmldb:get-child-collections**Returns:** `xs:string*` — the sequence of child collection names ✓ (regression check)

Added a cypress regression test that asserts util:log's hover renders cleanly and that neither garbled form ever reappears.

joewiz added 2 commits June 5, 2026 17:06
…Signature by call-site arity

Per oxygen-plugin feedback on eXist-db#44, signature-help previously returned a
single-element \`signatures[]\` even for functions with multiple arities
(e.g. \`substring\` has eXist-db#2 and eXist-db#3). Real consumers want the full set so
they can show the right overload as the user fills in arguments.

\`POST /api/langservice/signature-help\` now returns all overloads of the
function at the cursor, sorted by arity ascending. \`activeSignature\`
points to the overload whose arity matches the call-site arity that the
parser resolved to.

The overload collection walks the resolved function's namespace module
(\`pContext.getAllModules()\` filtered by \`namespaceURI\`) and
\`pContext.localFunctions()\` for user-declared overloads, then dedups by
arity. \`pContext.getSignaturesForFunction()\` alone wasn't sufficient —
it only returns signatures the current compilation has *loaded*, so
arities the user code didn't reference are missing.

Verified manually:
- \`substring("abc", 2, 1)\` → 2 sigs (eXist-db#2 + eXist-db#3), activeSignature → eXist-db#3
- \`substring("abc", 2)\`    → 2 sigs (eXist-db#2 + eXist-db#3), activeSignature → eXist-db#2
- \`count((1,2,3))\`         → 1 sig  (#1), activeSignature → #1

Cypress: 2 new tests cover both-arities-returned and
activeSignature-matches-call-site. Also fixed an existing test that
hardcoded \`signatures[0]\` rather than reading
\`signatures[activeSignature]\` — that was fine when we returned a
single sig, but is wrong now that \`signatures\` is sorted by arity
ascending and \`[0]\` may not be the active one.
Per oxygen-plugin feedback on eXist-db#44, the previous AST-based call lookup
required the expression to parse cleanly — which fails for the mid-
typing states where parameter help is most valuable: \`util:log(\`,
\`util:log("info",\`, \`substring("abc", 2, \`, etc. Today's parser sees
an incomplete call and returns no signatures.

Resolution is now name-based and decoupled from successful parsing:

1. Forward-scan the raw text from the start of the expression to the
   cursor, maintaining a stack of unmatched open parens (string
   literals skipped). The top of the stack is the enclosing call's
   open paren.
2. Walk left from that paren over whitespace, then capture an NCName
   (optionally prefixed) — that's the function name.
3. Resolve the prefix via the XQuery context's standard namespaces
   (fn, xs, util, xmldb, map, array, math, …); unprefixed names use the
   default function namespace.
4. Collect all loaded signatures with that QName (same module-walk +
   localFunctions approach used elsewhere), sorted by arity ascending.
5. Count top-level commas before AND after the cursor up to the
   matching close paren. activeParameter = before-count. Intended
   arity = before + after + 1. activeSignature = smallest overload with
   arity ≥ intendedArity (or largest available if none fits).

The "count commas after the cursor too" piece matters for complete
calls like \`substring("abc", 2, 1)\` with the cursor on the \`2\`:
commas-before-cursor alone would pick eXist-db#2, but the user clearly intends
eXist-db#3. With the after-count, intendedArity = 1 + 1 + 1 = 3 → eXist-db#3. ✓

Verified against the cases the oxygen team flagged:

  util:log(                  → sigs=1, active=eXist-db#2, activeParam=0
  util:log("info",           → sigs=1, active=eXist-db#2, activeParam=1
  util:log("info", "x",      → sigs=1, active=eXist-db#2 (clamped), activeParam=1
  substring("abc",           → sigs=2, active=eXist-db#2, activeParam=1
  substring("abc", 2,        → sigs=2, active=eXist-db#3, activeParam=2
  substring("abc", 2, 1)     → sigs=2, active=eXist-db#3, activeParam=1 (regression: still right)
  count(                     → sigs=1, active=#1, activeParam=0
  1 + 2                      → null (not in a call)
  foo:bar(                   → null (unknown namespace prefix)

The full XQueryLexer / XQueryParser / TreeParser scaffolding is gone
from this file — text-scan is sufficient for the call-finding work and
much more lenient than requiring a successful parse.

Cypress: 2 new tests — one covers four mid-typing positions
(util:log(, util:log("info",, substring("abc",, substring("abc", 2,);
the other pins "unknown function name → null".
@joewiz

joewiz commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

[This response was co-authored with Claude Code. -Joe]

In d8d6936, switched resolution to be name-based and decoupled from a successful parse, so mid-typing states produce help.

What it does:

  1. Forward-scan the raw text from the start of the expression to the cursor, maintaining a stack of unmatched open parens (string literals skipped). Top of stack = enclosing call's (.
  2. Walk left from that ( over whitespace, then capture an NCName (optionally prefixed) — that's the function name.
  3. Resolve the prefix via the XQuery context's standard namespaces (fn, xs, util, xmldb, map, array, math, …); unprefixed names use the default function namespace.
  4. Collect all loaded signatures with that QName (same module-walk + localFunctions() approach as before), sorted by arity ascending.
  5. Count top-level commas before AND after the cursor up to the matching close paren. activeParameter = before-count. Intended arity = before + after + 1. activeSignature = smallest overload with arity ≥ intendedArity (or largest available if none fits).

The "count after the cursor too" piece keeps the existing complete-call behaviour correct: for substring("abc", 2, 1) with the cursor on the 2, commas-before-cursor alone would pick #2, but the user clearly intends #3. With the after-count, intendedArity = 1 + 1 + 1 = 3 → #3.

Verified across flagged problem cases plus a few existing ones:

util:log(                  → sigs=1, active=#2, activeParam=0
util:log("info",           → sigs=1, active=#2, activeParam=1
util:log("info", "x",      → sigs=1, active=#2 (clamped), activeParam=1
substring("abc",           → sigs=2, active=#2, activeParam=1
substring("abc", 2,        → sigs=2, active=#3, activeParam=2
substring("abc", 2, 1)     → sigs=2, active=#3, activeParam=1   (regression: still right)
count(                     → sigs=1, active=#1, activeParam=0
1 + 2                      → null (not in a call)
foo:bar(                   → null (unknown namespace prefix)

The XQueryLexer / XQueryParser / TreeParser scaffolding is gone from SignatureHelp.java — text-scan is sufficient and avoids the "parse must succeed" trap that was eating the mid-typing cases.

Two new cypress tests; all 17 langservice tests pass; PMD clean.

Hover on a variable reference now reports the variable's type:

  let $x := 1 return $x              → `\$x` as `xs:integer`
  let $y := //para return $y         → `\$y` as `node()*`
  for $z in (1,2,3) return $z        → `\$z` as `item()`
  function $items as item()* → ...   → `\$items` as `item()*`
                                        plus the parameter docstring

Implementation: a new VariableTypeFinder visitor walks let/for
bindings, prolog VariableDeclarations, and user-function parameters
(via UserDefinedFunction.getSignature().getArgumentTypes()). The
latest-declared binding whose source position is at-or-before the
cursor wins, mirroring lexical scope's "innermost shadows" without
re-implementing full scope tracking.

For let/for, BindingExpression's declared sequenceType field is
protected with no public getter, so the displayed type is synthesized
from the bound expression's analyzer-inferred returnsType +
cardinality. For for-loops, cardinality is overridden to EXACTLY_ONE
(the bound variable iterates one item at a time, not the whole input
sequence). The user's literal type annotation (e.g.
"let $x as xs:integer := 1") isn't recoverable through public API;
the inferred type usually matches anyway.

For function parameters, both the declared type and the parameter's
own description (from FunctionParameterSequenceType.getDescription())
appear in the hover body.

Cypress: three new tests covering let-bound, for-bound, and
user-function-parameter variable hover.

Closes the oxygen-plugin team's "(3) Variable hover type info"
recommendation on eXist-db#44.
@joewiz

joewiz commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

[This response was co-authored with Claude Code. -Joe]

Commit f96e9f8 adds "variable hover type info":

let $x := 1 return $x              → `$x` as `xs:integer`
let $y := //para return $y         → `$y` as `node()*`
for $z in (1,2,3) return $z        → `$z` as `item()`
function ($items as item()*) {…}   → `$items` as `item()*`
                                      plus the parameter docstring

Implementation: a new VariableTypeFinder visitor walks let/for bindings, prolog VariableDeclarations, and user-function parameters (via UserDefinedFunction.getSignature().getArgumentTypes()). The latest-declared binding whose source position is at-or-before the cursor wins — mirrors lexical scope's "innermost shadows" without re-implementing full scope tracking.

One implementation note worth flagging: for let/for bindings, BindingExpression's declared sequenceType field is protected with no public getter. So the displayed type is synthesized from the bound expression's analyzer-inferred returnsType + cardinality. The user's literal annotation (e.g. let $x as xs:integer := 1) isn't recoverable through public API; the inferred type usually matches anyway. For for loops the cardinality is overridden to EXACTLY_ONE since the bound variable iterates one item at a time, not the whole input sequence.

Function parameters get both the declared type and the parameter docstring (from FunctionParameterSequenceType.getDescription()).

Three new cypress tests; all 20 langservice tests pass; PMD clean.

This closes the recommendations after a review of PR #44:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant