Joint Tagger to support probability-aware POS tagging by marcbachmann · Pull Request #3717 · Automattic/harper

marcbachmann · 2026-06-25T06:39:22Z

Issues

Fixes #3680 — MissingPreposition false positive on predicate nouns/adjectives ("We were best friends").
Fixes #3204 — MissingPreposition false positive ("It's been many years since I've…").

Both are verified before/after against master (see How Has This Been Tested?). This PR also removes
four real-world MissingPreposition / ThereToTheir false positives from the Great Gatsby / Alice in
Wonderland test corpus, and hardens a class of homograph-driven false positives/negatives that the same
linter family is repeatedly reported for (e.g. #1585, #3089, #3087, #2856, #3700, #1947, #3203, #2301).
I've intentionally not written Fixes for those, because against current master they either already
pass or need a separate look — see the table in How Has This Been Tested? for exactly which ones I could
prove and which I couldn't (e.g. #3214 still false-positives and is out of scope here).

Depends on #3694. That PR (the CoNLL-U training-data extraction refactor) is the foundation this model
is trained on. This branch is stacked on top of it; merge #3694 first, or review this with #3694's single commit as the base.

Description

Replaces Harper's two separate English models — the JSON Brill POS tagger and the Burn NP chunker — with a single small neural joint model that produces both per-token UPOS tags and noun-phrase membership from one shared encoder, and uses the model's probability distribution to fix a whole class of homograph linter bugs the old argmax-only tagger couldn't.

Sorry, this is a lot to take in at once. 🙈
But I'm opening this PR for now to receive input for a first step and to figure out which steps this project is willing to go. The motivation for me is to introduce other language support like in #3402 while keeping the binaries small for those.

What Changed, in Iterations

1. The joint model (harper-pos-utils::joint)

One char-CNN + word-final-suffix encoder → BiLSTM → two heads (UPOS classification + NP membership).
Inference is CPU-only and needs no extra runtime deps; the GPU/training path is gated behind a training feature so the default build stays lean.
harper-brill::brill_tagger() and burn_chunker() are now both served by the one embedded, memoized per-thread JointRuntime. New annotator() returns tags + plausible-tag sets + NP flags from a single cached forward pass. Some cleanup should probably still happen before this lands completely (If you think nobody needs the old setup, I can remove those completely. The code was built for compatibility while removing the old models already out of the build. The assets are still there).

2. Probability-aware POS matching (the reason this is worth doing)

A neural tagger resolving a genuine homograph ("books" = NOUN/VERB, "for" = ADP/SCONJ) often ranks the contextually-correct tag a hair below the argmax. Strict-argmax linters silently drop that reading.
New pos_tag_topk (runtime-only, #[serde(skip)]) carries the plausible-tag set; could_be_pos / could_be_upos and UPOSSet::new_loose let a linter opt into matching a plausible reading.
Exposed in the Weir rule DSL as the per-token ~TAG operator (e.g. ~NOUN).
Strictly opt-in per rule — loosening globally over-fires the precision guardrails that depend on the argmax, so each rule was tuned individually.
Sadly still 3 times slower on cold inference compared to brill and the burn chunker. But if accuracy is the target, this could be the way forward as the drawback is not that big if other rules can be simplified.

3. Per-rule linter fixes built on that infrastructure (each commit explains the exact homograph frame):
possessive_noun, noun_verb_confusion, its_contraction, missing_preposition (now restricted to a curated PP_TAKING_ADJECTIVES allowlist), to_two_too, were_where, there_to_their, all_ready, missing_determiner, missing_to.

4. Performance (cold inference is the bottleneck for one-shot CLI/LSP runs):

SIMD on native (burn-ndarray simd): ~1.8× faster cold inference.
Single cached forward pass + SmallVec tag sets: ~7% faster warm parse_essay, fewer allocations.
burn 0.19.1 → 0.21.0, inference on burn-flex: ~5.6× faster cold inference (0.53 ms vs 2.97 ms / sentence), byte-identical output. Real-prose parse_essay 5.9× faster, lint_essay_uncached 3.4× faster.

Model quality

UPOS accuracy 0.9463, NP-chunk F1 0.8872 (UD English dev).
Embedded artifact footprint drops ~2.05 MiB → ~787 KiB (≈62% smaller): one model.mpk (752 KiB) + two small vocab JSONs, replacing the old tagger JSON + chunker model + chunker vocab + chunker JSON. Good for the WASM bundle and the harper-ls binary (cf. Reduce binary size #1945).

Reproducibility / faster iteration

The whole model is regenerable end-to-end via the regenerate_english_joint example (gated behind
training): it auto-fetches the UD treebanks, trains deterministically (seeded), and writes the embedded artifacts. joint_determinism_check verifies the training is reproducible.
Training now takes ~14 minutes on an M3 MacBook Air (30 rounds), versus the multi-day runs the old pipeline needed. This makes the model cheap to iterate on — retuning or retraining on updated data is an afternoon, not a week.

Demo

Before/after on two reported false positives (American dialect, default rule set):

$ harper-cli lint sample.md      # "We were best friends."
master:  MissingPreposition  (false positive)   →   this PR:  clean
$ harper-cli lint sample.md      # "It's been many years since I've seen her."
master:  MissingPreposition  (false positive)   →   this PR:  clean

Real-prose corpus diff (harper-core/tests/text/linters/*.snap.yml) — 4 false positives removed, 0 added:

Source	Sentence	Old lint
Alice	"It was high time to go…"	`MissingPreposition`
Gatsby	"We were close friends."	`MissingPreposition`
Gatsby	"They were careless people…"	`MissingPreposition`
Gatsby	"…that there crystal glass."	`ThereToTheir`

How Has This Been Tested?

Full harper-core suite (5349 lib + integration tests) green; clippy clean; POS-tagger and prose-lint snapshots regenerated and reviewed (the prose-lint snapshot diff is removals only).
New unit tests per touched linter, using the reported sentences and real prose.

Before/after built harper-cli from master (2.6.0) vs this branch and diffed the lint output on the reported sentences:

Issue	Sentence	master	this PR	verdict
False positive: "We were best friends" flagged for missing preposition #3680	"We were best friends."	`MissingPreposition`	clean	fixed
False positive: `MissingPreposition` triggered by "It's been many years since I've..." #3204	"It's been many years since I've seen her."	`MissingPreposition`	clean	fixed
—	"The students are interested learning."	`MissingPreposition`	`MissingPreposition`	recall kept ✓
—	"He forgot send the file."	`MissingTo`	`MissingTo`	recall kept ✓
—	"Check were the error occurred."	`WereWhere`	`WereWhere`	recall kept ✓
Spuriously flagging to insert a preposition #1585 / False Positive: `want` -> `want to` preceeding e.g. 'phase one' (`MissingTo`) #3089 / False Positive: 'wanted revenge' -> 'wanted to revenge' (`MissingTo` #3087 / False positive in `ToTwoToo` #2856 / False positive flagging "there" to change to "their" #3700	(reported sentences)	already clean	clean	guarded under new tagger, but no master delta to claim
False positive: "What do you think were the" -> "What do you think where the" (`WereWhere`) #3214	"What do you think were the best moments?"	`WereWhere` (FP)	`WereWhere` (FP)	not fixed — out of scope

The regression-guard rows matter: the tagger swap does not cost recall on the real errors.

AI Disclosure

I am a human and didn't use any AI.
I used LLM features of my editor, but not an agent.
I used an AI agent interactively.
I am an agent or I got an agent to do the work autonomously.

If Your PR Implements or Enhances a Linter

I made up the sentences in the unit tests.
The sentences in the unit tests were generated by an AI.
I'm using examples from the bug report / feature request.
I collected real-world sentences for the unit tests.

Checklist

I have performed a self-review of my own code
I have added tests to cover my changes
I have considered splitting this into smaller pull requests.

…_utils Five training-data extraction sites — the Brill tagger (`epoch`, `FreqDictBuilder`) and the Brill / Burn / UPOS-frequency chunkers — each hand-rolled their own CoNLL-U sentence walk: merging contraction suffixes with a copy-pasted list, iterating raw token rows, and (for the chunkers) locating noun phrases over those raw rows. That last part shifts every noun-phrase span in a sentence containing a multiword-token range (e.g. English "cannot", "don't") or an ellipsis node, so the extracted tokens, tags and NP labels are wrong for such sentences. Move the extraction into one place — `conllu_utils` — built on a single multiword-token/contraction-aware sentence walk: - `syntactic_words` / `walk_sentence` merge UD component rows back into the surface tokens Harper actually tokenizes at runtime ("don't" is one token), keeping a span map so noun-phrase flags (computed on the syntactic words) fold back onto the merged tokens without shifting. - `sentence_to_training_pair` / `sentence_to_training_record` and the `extract_pairs_from_files` / `extract_records_from_files` batch helpers return `(tokens, tags[, NP mask])`. - `locate_noun_phrases_in_words` (moved here from `chunker::np_extraction`, now operating on syntactic words) is the shared NP locator. All five sites call these helpers; the duplicated loops are gone and the multiword-token / contraction span-shift is fixed. Inference and the shipped models are unaffected — these are training-data paths only. Regenerating the Brill tagger, Burn chunker or frequency dictionaries will produce corrected data on sentences with multiword tokens or contractions.

Introduce `harper_pos_utils::joint`: a single neural model that produces both per-token UPOS tags and noun-phrase membership from one shared encoder, replacing the separate Brill tagger + Burn NP chunker for English. Architecture: a character encoder (parallel convolutions over char embeddings + a word-final suffix embedding) feeds a BiLSTM, which fans out to two heads — a UPOS classification head and an NP-membership head. Inference runs on burn's NdArray (CPU) backend; the training path (`train`, `inject`, `eval`, `conllu_utils`) is gated behind the existing `training` feature so the default build stays GPU-free. Tag classes are 0-based and positional: class `i` is the `i`-th `UPOS` variant in declaration order (ADJ..VERB), with class 16 reserved for padding. Also add `Tagger::tag_sentence_topk`, a default-provided method returning the set of *plausible* tags per token (argmax plus close runner-ups). The joint runtime overrides it with the model's softmax distribution; this is what later lets linters resolve genuine homographs the argmax alone can't.

Replace the separately-loaded Brill tagger and Burn NP chunker with the single embedded joint model. `brill_tagger()` and `burn_chunker()` now both return the one memoized, per-thread `JointRuntime` (an `Rc<dyn …>`, since burn's NdArray model is `!Sync`). The trained artifacts (`finished_joint/model.mpk` + the char/suffix vocabularies) ship embedded; inference is CPU-only and needs no extra dependencies. Reproducibility: the `regenerate_english_joint` example (re)trains the model end-to-end and is gated behind a new `training` feature that pulls burn's autodiff + GPU backend. It auto-fetches the UD treebanks, trains deterministically (seeded), and writes the embedded artifacts; `joint_determinism_check` verifies the training is reproducible. The `diagnose_lint_words` / `eval_dev` / `probe_pos_probs` examples are inference-only diagnostics.

The joint model assigns different (and overall more accurate) UPOS tags than the Brill tagger, so the `pos_tags` snapshots are regenerated to match. Pure tagging output; no linting behavior is involved here. This is the boundary of the model layer: English tagging + chunking now run on the joint model and the `pos_tags` suite is green. The lint suite is not yet green at this commit — swapping the tagger shifts a handful of genuine homographs (NOUN/VERB, ADP/SCONJ, ADV/ADJ) onto the model's rank-2 reading, which the strict-argmax linters discard. The following "linting layer" commits expose the model's distribution to those rules and close the gap.

A neural tagger resolving a genuine homograph ("books" = NOUN or VERB, "for" = ADP or SCONJ) often ranks the contextually-correct tag a hair below the argmax. Strict-argmax linters discard that reading and miss the error. Instead of perturbing the model, expose its distribution to the linters: - `DictWordMetadata.pos_tag_topk` carries the plausible-tag set (argmax plus close runner-ups) alongside `pos_tag`. It is runtime-only (`#[serde(skip)]`) and stores a tag *set*, so the type keeps its `Eq`/`Hash`/`Ord` derives. - `DictWordMetadata::could_be_pos` / `TokenKind::could_be_upos`: "is this a plausible reading?" — the probability-aware counterpart to `is_upos`. - `UPOSSet::new_loose`: a `UPOSSet` that matches on `could_be_pos` rather than the strict argmax. - `Document::parse` populates `pos_tag_topk` from the tagger's top-k. This is strictly opt-in: every existing call site keeps strict-argmax matching until it deliberately switches. Loosening globally over-fires the precision guardrails that depend on the argmax, so it is applied per rule.

Expose probability-aware POS matching in the weir rule DSL. Writing `~NOUN` (the existing tilde punctuation followed by a UPOS tag) compiles to a `UPOSSet::new_loose`, so a weir rule slot matches a plausible reading of the token rather than only the strict argmax — the DSL equivalent of `could_be_upos`. Like the Rust API, it is opt-in per slot.

…omograph heads The possessed head noun is frequently a noun/verb homograph ("song", "books", "shop") that the joint tagger ranks VERB#1, NOUN#2 in this out-of-distribution possessive-mistake frame. Switch the head slot from `UPOSSet::new` to `new_loose` so the plausible NOUN reading still matches. The rule's trigger is rare and specific, so loosening here introduces no prose false positives.

…raph modifiers In "deep breathe" / "sound affect", the modifier ("deep", "sound") is a homograph the joint tagger ranks ADJ#2 / NOUN#2 behind a verb reading. - verb_instead_of_noun: loosen the leading ADJ/ADP slot to `new_loose`; the existing AUX guard and curated verb list keep it from over-firing. - affect_to_effect: treat a VERB-tagged token that could also be a NOUN as valid preceding context (it modifies the affect/effect head rather than governing it).

… clausal "for" In "its common for users to …", the "for" introducing the infinitival clause is ranked ADP#1, SCONJ#2 by the joint tagger. Match the plausible SCONJ/PART reading (`new_loose` in the pattern, `could_be_upos` in the follow-up check) so the "its → it's" contraction error is still caught.

… clauses Add an `imperative_were_clause` arm: a sentence-initial imperative verb ("Check were the error occurred" → "where") is mis-tagged NOUN/PROPN with VERB only a low-but-plausible reading, so loose-match the leading VERB — but anchored to sentence start. The anchor is what makes this precise: it excludes existential "There were …" (no plausible VERB reading on "There") and mid-sentence "…people there were…" (the start anchor never matches), the two false positives a blanket loose match produced. Also loosen the "you where <verb>" slot so adverb/adjective homographs ("right", "sure") match.

…ctive "too stiff/slippery/polite" is correct excess-degree usage, but the joint tagger sometimes argmaxes the adjective as VERB. Add a probability-aware suppress guard: don't rewrite "too → to" when the following token could plausibly be an ADJ (`could_be_upos`). Probability tolerance in reverse — the rule fires only when the adjective reading is implausible.

…djectives Only fire when the predicate adjective is one that idiomatically takes a prepositional complement (interested IN, famous FOR, afraid OF) — a curated `PP_TAKING_ADJECTIVES` allowlist. A general descriptive adjective merely modifies the following noun attributively ("live hedgehogs") or heads a clause ("sure he'd start"); both are structurally identical to the real error, so the adjective itself is the only signal that separates them. Deliberate trade-off: precision over recall. Also clears pre-existing baseline false positives (high time / close friends / careless people).

Add the tagger-resistant possessed nouns (ladder, answer, server, understanding) to the rule's existing literal escape-hatch array (alongside mittens/pawn/…). Precise by construction — it only ever matches "there <these exact nouns>", so it cannot fire on locative "there".

…ith `~TAG` These rules trigger on rare, specific constructions, so probability tolerance is safe. Switch their adjective/adverb slots to the loose `~ADJ` / `~ADV` operator so the plausible reading matches when the joint tagger ranks the adjective/adverb a hair below the argmax.

…fident noun The permissive pattern matches on dictionary verb-lemma *capability*, so a controller word can pair with any word that is also a verb in the dictionary ("expected feature" — "feature" is a verb lemma). Add a probability-aware guard: if the tagger is confident the following word is a non-verb noun (no verb reading even among its plausible top-k tags), there is no infinitive to complete, so don't fire — "a standard and expected feature" is a nominal phrase, not "expected *to* feature". A genuine homograph ("site selling …", where "selling" keeps its VERB reading) is unaffected and still fires.

…d, 0 added) Regenerate the prose lint snapshots for the joint model + probability-aware linting. The net change is purely the removal of four pre-existing false positives on the prose corpus — "high time" / "close friends" / "careless people" (missing_preposition) and "that there crystal glass" (there_to_their). No new lints are introduced: the diff is removals only.

Fold tagging, probability-aware top-k, and NP chunking of a sentence into a single forward pass instead of two, and expose it through one generic helper: - `JointRuntime::infer` derives the argmax tags, the plausible-tag (top-k) sets, and the NP flags from one model run, cached per sentence. Previously `tag_sentence` and `tag_topk` each ran their own forward pass; only the chunk call was a cache hit. The top-k set now lists the argmax first, so the chosen tag and the set share one source of truth. - The cache lookup borrows the `&[String]` slice (`Vec<String>: Borrow<[String]>`, identical hash), so a hit costs only the hash — no key clone; the owned key is allocated only on a miss, alongside the forward pass that dwarfs it. - `Document::parse` calls one generic `annotate(tagger, chunker, sentence)` returning `(plausible-tag sets, NP flags)`. It works for any Tagger/Chunker: the joint model collapses the two trait calls to the single cached forward; a separate tagger + chunker run independently. `pos_tag` is `tags[i].first()`. Behavior-preserving: the full suite stays green and the POS/lint snapshots do not change.

…ached pass Introduce `Annotator: Tagger + Chunker` so a sentence's plausible-tag sets and NP-membership flags come from a single `annotate()` call backed by one cached forward pass, replacing the separate tag_sentence_topk + chunk_sentence path. Per-token tag sets are now `TagSet = SmallVec<[UPOS; 4]>` (argmax first), which flows straight into harper-core's `pos_tag_topk` without re-collecting and keeps the common 1-3 tag set inline (no heap alloc). Drops the duplicate argmax `Vec` and removes the now-redundant `tag_topk` / inherent `annotate` methods. ~7% faster parse_essay (warm) from the dropped allocations.

… cold inference) Gate burn-ndarray's `simd` (and the `std` it transitively needs) to non-wasm32 targets via a target-specific dependency table. SIMD nearly halves cold per-sentence inference (parse_essay cache-off ~1.26s -> ~0.68s on aarch64); the gemm was already NEON-vectorized through matrixmultiply, so the win is in the elementwise / LSTM-gate ops the `simd` feature accelerates. The feature is not no_std-clean in burn-ndarray 0.19.1 (its ops/simd modules use Vec/vec! without `use alloc::vec::Vec`), so it also requires `std`; the wasm32 build keeps the no_std scalar path. Full lint suite stays green with SIMD active (no FLOOR-sensitive top-k membership flips), and the snapshots — originally generated on the scalar path — still match, so native-SIMD and wasm-scalar agree across the test corpus.

Two changes to JointRuntime::infer's per-token decode: - index_to_upos is now a direct `match` (jump table) instead of scanning `UPOS::iter().find(...)`, dropping the `strum::IntoEnumIterator` import. The arms mirror the enum's declaration order; the test is reworked into a drift guard that checks every arm against the real discriminant, so a reorder or new variant fails loudly instead of silently mis-tagging. - The chosen tag is now derived from the logits in-loop via `argmax_first` rather than a separate `argmax(2)` tensor op: this reads `tag_logits` once (no clone, no second into_data, no class buffer) and folds the argmax and the softmax max-shift into one per-row scan. `argmax_first` is a strictly-greater, first-max scan that mirrors burn-ndarray's argmax tie-break exactly (pinned by a unit test), so the chosen tag — and every lint — stays bit-identical. Verified zero lint drift: full harper-core suite (5349 lib + integration) green, clippy clean. Cold parse_essay is unchanged-to-marginally-faster (~1%, within the run-to-run noise floor; the decode is a negligible fraction of the forward).

Two allocation cuts on the cache-miss inference path: - Split JointBatch::build into build_inference (char-id + suffix-id buffers only — the inputs forward() reads) and build (which calls build_inference then fills the gold tag/np/mask label buffers used solely by the training loss). JointRuntime::infer now calls build_inference, dropping the two dummy tag/np vectors plus the three dead label-buffer allocations and their fill per sentence. build keeps its signature, so the training path is unchanged; the label accessors gain a debug_assert that fires if they are called on an inference batch (empty label buffers). - The per-token softmax writes into a fixed `[f32; TAG_CLASSES]` stack buffer instead of allocating a `Vec<f32>` per token. Output bit-identical (full harper-core suite green, zero lint drift; clippy clean; training compiles), confirmed by an adversarial review of the diff. Cold-path only and below the parse_essay noise floor, but removes real dead work.

…ence Upgrade burn 0.19.1 -> 0.21.0 and switch the joint tagger's runtime inference backend from burn-ndarray to burn-flex, a fast pure-Rust CPU backend (gemm + SIMD). flex is now the only inference backend, on every target including wasm32 (no rayon + critical-section atomics shim there). - cold inference ~5.6x faster (0.53ms vs 2.97ms/sentence), byte-identical output; real-prose parse_essay 5.9x faster, lint_essay_uncached 3.4x. - retrain the embedded model on 0.21 (UPOS 0.9463 / NP F1 0.8872); the 0.19 model.mpk can't load on 0.21 (version guard + new LSTM gate_activation field). - burn 0.21 moved Backend::Device to a BackendTypes supertrait; name the concrete device type in runtime.rs instead. - regenerate_english_joint: AutoGraphicsApi instead of hardcoded Metal so training runs on any wgpu GPU (Metal/Vulkan/DX12). - save_then_load_preserves_forward: rewrite as an fp16->fp16 round-trip equality check (the old fp32-vs-fp16 threshold failed on both backends under 0.21's RNG, not a flex issue). - char/suffix vocab JSON now pretty-printed with ordered keys for diff-friendly artifacts; key order is cosmetic (ids are the values). burn-ndarray remains only for the legacy BurnChunker and tests.

hippietrail · 2026-06-25T08:56:12Z

Alice "…that there crystal glass."

This is also from The Great Gatsby, chapter IV, not from Alice in Wonderland:

“He’s a bootlegger,” said the young ladies, moving somewhere between his
cocktails and his flowers. “One time he killed a man who had found out that he
was nephew to Von Hindenburg and second cousin to the devil. Reach me a rose,
honey, and pour me a last drop into that there crystal glass.”

…n tooling - train_joint gains a periodic-eval hook: every N epochs (and the last) it hands the validated inner model to a callback. The regenerate example scores it on the production fp16 + flex path (eval_via_flex), so the logged UPOS/NP-F1 curve reflects what ships, not the fp32 training graph. - JOINT_OUT / JOINT_EPOCHS env overrides so sweeps train into a scratch dir without clobbering the committed model. - Default epochs 100 -> 30, chosen from the dev plateau: dev UPOS peaks at epoch ~25-30 and declines past it (mild overfit). A 30-epoch cosine schedule reaches UPOS 0.9479 / NP F1 0.8972 vs the old 100-epoch 0.9463 / 0.8872, in a third of the wall time. Keep AutoGraphicsApi (GPU-agnostic) with a comment.

…ty-aware Regenerate the embedded joint model at 30 epochs (UPOS 0.9479 / NP F1 0.8972). The model's argmax POS tags shift on a number of borderline tokens, so make the lints that keyed off the strict argmax read the tagger's top-k instead, and add a dictionary escape where the right tag fell below the top-k floor entirely: - missing_to: gate the want/need noun-suppression on could_be_verb (top-k). - theses_these: match the head noun with UPOSSet::new_loose. - its_possessive: exclude adjective-only words (dict adjective, not dict noun) from the possessed-noun slot ('it's insincere' stays a contraction). - to_adverb: add 'maybe' to the by-word adverb escape (tagged VERB here). - weir HaveNegNoAny / RainbowColoredHyphen: loosen the final NOUN slot to ~NOUN. - weir MissingDeterminer: guard the ~ADJ modifier with <~ADJ, !DET> so 'another' (argmax DET, inflated ADJ runner-up) isn't read as a bare noun phrase. - weir ThereToTheir: add (there <VERB, ~NOUN> [VERB, AUX]) for 'There <noun> <verb>' errors where the subject noun tags VERB, without firing on existential 'there' (there must be / here and there / that there crystal). Also: run_tests_for_weir_rules now aggregates every rule's failures instead of panicking on the first (which had masked later regressions), and the POS-tagger snapshots are regenerated for the new model. Lint snapshots are unchanged. Full suite: 5351 passed, 0 failed; test_most_lints + test_pos_tagger green.

marcbachmann added 23 commits June 21, 2026 18:10

style: Apply formatting

5d7d804

chore: Remove unused files

36caedc

marcbachmann force-pushed the joint-tagger branch from 507d373 to 03b8968 Compare June 25, 2026 06:46

marcbachmann force-pushed the joint-tagger branch from 03b8968 to 0f0a336 Compare June 25, 2026 06:57

chore: Fix lint issues

48a4eee

hippietrail requested a review from elijah-potter June 25, 2026 07:51

marcbachmann force-pushed the joint-tagger branch from 41602cb to 3865f1e Compare June 25, 2026 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Joint Tagger to support probability-aware POS tagging#3717

Joint Tagger to support probability-aware POS tagging#3717
marcbachmann wants to merge 27 commits into
Automattic:masterfrom
marcbachmann:joint-tagger

marcbachmann commented Jun 25, 2026 •

edited

Loading

Uh oh!

hippietrail commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

marcbachmann commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issues

Description

What Changed, in Iterations

Model quality

Reproducibility / faster iteration

Demo

How Has This Been Tested?

AI Disclosure

If Your PR Implements or Enhances a Linter

Checklist

Uh oh!

hippietrail commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

marcbachmann commented Jun 25, 2026 •

edited

Loading