Skip to content

/ref/index: Browseable IDs/URLs tree #48

@gwern

Description

@gwern

Right now the /ref/$ID provides a way to look up an ID like smith-et-al-2026 or a URL like /doc/2026-smith.pdf, with various heuristics and fallbacks to try to repair it by looking at the whole universe of URLs/IDs (bucketed in various JSON files for efficiency, and available as a whole as a ~7MB /metadata/annotation/id/all.json object with key-value pairs like {"/2012-election": "gwern-2012-election", "/2012-election.md": "_HnxeFaYB", ... }{.JSON}). See LinkID.hs & writeOutID2URLdb{.Haskell} for more background on the ID system.

This is an implicit search/navigation, but we do not provide direct browsing capability for those who want to see what is there or only remember a prefix or infix, say.

We can provide dynamically client-side, using collapses, a classic 'directory tree' structure for IDs, and for URLs, which allows the reader to drill down by prefix, or to view the entire set of ~80k ID/URL pairs, by using JS to download all.json and rendering a lazily-expanded (using disclosure-collapses) tree. (This is a major performance hit, but we do not expect many users to be using /ref/index too often, and we are not loading it by default anywhere, and it would be far from the most bandwidth-hungry page on Gwern.net, so I regard a >7MB download to be an acceptable cost for allowing instant access, once loaded, to anywhere in the directory tree rather than taking the constant hit of downloading the ~64 other JSON bucketed files and needing to download them all anyway if the user wants to do a global search.)

This will be a special-case on the reference URL /ref/index, which is currently a null because the placeholder JS doesn't know how to handle it. The JS will simply do the equivalent of location.pathname == "/ref/index"{.Javascript} (The feature was inspired in part by noticing that it currently just results in "ID index does not exist.", which would surprise a Gwern.net reader who expects index to do something useful like list all references.)

The idea is to treat each letter in an ID/URL as a level of the hierarchy (similar to tries), and stop the recursive expansion when we hit a 'reasonable number' to display at a single time (let's say 30 in a node for starters, to try to make the tree as flat as possible and reduce the number of jumps, bundling things together as ranges, possibly making this a Radix tree? ie. at any node, if total descendants ≤ 30, stop splitting and show them as a flat bucket). Duplicate keys are combined (duplicate ID leaves rendered as one ID with multiple URL links).
If at some node the per-letter partition produces many small buckets (a:3, b:2, c:5 …), group consecutive sparse letters into one [a–c] super-bucket until it would hit the size limit. The algorithm goes something like:

  1. Build records sorted by displayed key.
  2. At each node, if descendant leaf count ≤ 30, render a flat bucket.
    3 Otherwise partition by next Unicode code point.
  3. Merge consecutive low-count sibling partitions into range buckets until adding the next would exceed 30.
  4. Collapse single-child chains into radix labels, unless the keyboard navigator (see below) is active and needs per-character stepping.

So searching for an ID like smith-et-al-2026 might look like:

[\[**click to expand all (WARNING: EXPENSIVE)**\]]{.ref-expand-all-ids}

- `s`

    - `m`

        - ...

            - `26`: 
            
                - [`smith-et-al-20**26**a`](/ref/smith-et-al-2026a) ([`https://smith.com/zaphod.html`](https://smith.com/zaphod.html)),
                - [`smith-et-al-20**26**b`](...) (...), 
                - [`smith-et-al-20**26**c`](...) (...)

(At each level, the level's matching string should be emphasized to help the reader maintain their mental position.)

So then after expanding, one can easily C-f for the desired ID or the corresponding URL of the expanded IDs.

And then a parallel tree for URLs, which simply reverses the tuple to display the opposite direction (drilling down by URL to reveal available URLs and their IDs). In principle, it would be nice to allow drilling down in logical order inside a URL, like com, astralcodexten, www, query string, fragment (dropping the protocol as of no interest), but in practice I think this normalization would confuse users way too much, so we will not do that.
Both trees start uncollapsed, just showing all root nodes like [a..9_].

A 'click to expand' button for both allows readers to do 'global' searches using C-f, in case they know an infix or suffix or are not quite sure of prefix, or maybe they just want to see what's there, or they want to look at every link to a particular domain like astralcodexten.com, etc.

If there is demand, this could be extended with more in-page filtering controls.

Open question: keyboard focus/shortcuts. Tab would presumably jump from link to link, but not expand the collapse-disclosure elements. Can we hijack the [a-zA-Z0-9/.:?-_#%] key range to allow jumping at each level and uncollapsing while drilling down in URLs? (Some of these may not be overrideable and already taken by the browser, so we would just skip those instead of letting perfect be the enemy of better - we cannot seriously expect users to memorize a prefix or modifier for this one use-case!) Then the user could have a nice interactive experience of just opening the index, and typing 's', 'm', 'i', 't', 'h'... and drilling down in an experience closer to classical keyboard-driven CLI navigation like MidnightCommander. (Backspace/Esc would move up a level, and at the root, alternate tree focuses to allow rapidly trying a different approach and undoing an errant backspace at the top.) We would default to focusing on the ID tree, and leave focus at which either tree had an uncollapse last. Example workflow:

  1. User opens the index page.
  2. They press s → the tree expands the s top‑level node and highlights the first child under s (eg. sa).
  3. They press m → the tree further expands sm (if exists, otherwise just flash an error in the status bar like "No matching ID for the prefix 'xyz'") and highlights it.
  4. They press i → expands smi → shows leaf nodes.
  5. Pressing Enter on a leaf could navigate to the actual citation or URL.

Metadata

Metadata

Assignees

Labels

FrontendApplies to in-browser experience; primarily CSS/JS, but not the raw HTML, nginx, or backend/content.enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions