Snapshot from 2026-06-20 comparing SQL function catalogues from six reference databases against what SQLink ships. Drives the "what to port next" decisions for the extension survey.
gap-inventories.json (572 KB): full per-DB function listings,
one block per database. Hand-curated from each DB's vendor docs.
| DB | Functions | Source |
|---|---|---|
| postgresql (PG 17) | 293 | postgresql.org chapter 9 |
| mysql | 295 | dev.mysql.com 8.x reference |
| mariadb | 175 | mariadb.com knowledge base |
| duckdb | 274 | duckdb.org functions docs |
| clickhouse | 650 | clickhouse.com functions ref |
| snowflake_bigquery | 385 | combined snowflake + bq |
Each record carries name, category, arity, summary and
optional aliases. Operators excluded; aliases collapsed to a
canonical entry. See the notes field on each DB block for what
was specifically in/out of scope.
gap-report.json (64 KB): the ranked-by-coverage diff. The
gap_summary array holds 146 distinct function names that appear
in 2 of 6 source DBs but were NOT in SQLink's catalogue when
the snapshot was taken. Each entry includes dbs_have_it,
dbs_count, rough_summary, arity, and suggested_extension
(which existing or proposed extension would host the port).
141 / 146 (97%) of identified gaps are closed. The survey work driven by this report shipped through tasks #343-#347 (stdsql, sys-compat, list, stats extension, chrono extension) and the pre-existing text-utils / math extensions.
5 entries remain and are all dispositioned skip or core:
| Function | DBs | Disposition |
|---|---|---|
cast |
4 | covered: CAST keyword (SQLite has it) |
decode |
4 | ambiguous (PG bytea vs MySQL legacy crypto) |
convert |
3 | ambiguous (charset vs type conversion) |
encode |
3 | ambiguous (PG bytea vs string->bytes) |
grouping |
2 | core SQLite: needs GROUPING SETS support first |
cast is functionally covered by the SQL keyword.
decode/encode/convert would each need a vendor-specific
extension to disambiguate semantics not worth shipping a
generic name that doesn't match either expectation.
grouping requires GROUPING SETS / ROLLUP / CUBE in the
parser, which SQLite doesn't have today.
The inventories are hand-curated; there's no scraper to re-run. To refresh after a meaningful upstream release:
- Edit
gap-inventories.jsondirectly to add / update a DB'sfunctionsarray. - Regenerate
gap-report.jsonby intersecting againstprovenance/extensions.db'ssql_functiontable:
import json, sqlite3
inv = json.load(open("analysis/gap-inventories.json"))
con = sqlite3.connect("provenance/extensions.db")
have = {r[0].lower() for r in con.execute("SELECT DISTINCT lower(name) FROM sql_function")}
# For each function across all DBs, count how many DBs include it.
from collections import defaultdict
db_of = defaultdict(set)
meta = {}
for d in inv:
for f in d["functions"]:
nm = f["name"].lower()
db_of[nm].add(d["database"])
meta.setdefault(nm, f)
gap_summary = []
for nm, dbs in db_of.items():
if nm in have or len(dbs) < 2: continue
f = meta[nm]
gap_summary.append({
"function_name": nm,
"category": f.get("category", ""),
"dbs_have_it": sorted(dbs),
"dbs_count": len(dbs),
"rough_summary": f.get("summary", ""),
"suggested_extension": "TBD",
"arity": f.get("arity", ""),
})
gap_summary.sort(key=lambda g: (-g["dbs_count"], g["function_name"]))
json.dump({"gap_summary": gap_summary}, open("analysis/gap-report.json", "w"), indent=2)The suggested_extension field is curated manually it expresses
intent ("this belongs in stats", "stub in sys-compat") rather than
existence. After regenerating, sweep the new entries and assign
each one.
When this is re-run, expected additions:
- PostgreSQL 18 (Sept 2025) added a handful of bytea + jsonb helpers; revisit the json1 extension's surface.
- DuckDB ships a fast catalogue; quarterly resync is realistic.
- ClickHouse adds analytic functions aggressively always bumps the 650-count higher.
Task #338 closes here. Future refreshes can land as their own small tasks (regenerate + diff + port the new fan-out-of-3+ entries).