Background
A brief lit review confirms that Needleman-Wunsch / Gotoh global DP is the dominant pairwise-alignment paradigm, and that the research frontier is mostly acceleration of the same recurrence (WFA, KSW2, Parasail/striped SW, Edlib bit-parallel, GASAL2/GPU) rather than new alignment models.
The alignment paradigm in dada2-rs is deliberately fixed to global / ends-free NW because the error-model contract requires a positional alignment: al2subs compresses it into a Sub (per-position map + substitution list with both nucleotides and qualities), which the p-value model in pval.rs consumes to classify each difference as biological divergence vs base-calling error. Distance-only or local-only methods break this pipeline.
Rule of thumb for what's worth evaluating: a paradigm should address the global alignment problem from a genuinely different perspective — not change the problem.
Candidates that clear that bar
-
Pair-HMMs — fuse alignment and the error model into a single likelihood layer. DADA2 currently bolts pval.rs onto an NW alignment as two separate steps; a pair-HMM models error-vs-signal natively. Richer, but higher cost/complexity — assess whether it's justified for amplicon data.
-
WFA (wavefront alignment) — changes the complexity regime to O(n·s) (sequence length × edit distance), exploiting the same "sequences are very similar" assumption DADA2 already relies on (k-mer pre-screen + banded DP). Strongest candidate by the "different perspective, same problem" criterion. Caveat: we need a variant that retains the positional traceback, not score/edit-distance only, to feed al2subs.
Scope
- Literature/feasibility evaluation, not a commitment to implement.
- For any candidate: confirm it can produce the positional substitution record
pval.rs needs, and benchmark vs the current anti-diagonal auto-vectorized banded NW (align_vectorized_with_buf).
- Out of scope: switching away from a global paradigm (local/Smith-Waterman would wrongly trim divergent-but-real read ends).
Background
A brief lit review confirms that Needleman-Wunsch / Gotoh global DP is the dominant pairwise-alignment paradigm, and that the research frontier is mostly acceleration of the same recurrence (WFA, KSW2, Parasail/striped SW, Edlib bit-parallel, GASAL2/GPU) rather than new alignment models.
The alignment paradigm in dada2-rs is deliberately fixed to global / ends-free NW because the error-model contract requires a positional alignment:
al2subscompresses it into aSub(per-position map + substitution list with both nucleotides and qualities), which the p-value model inpval.rsconsumes to classify each difference as biological divergence vs base-calling error. Distance-only or local-only methods break this pipeline.Rule of thumb for what's worth evaluating: a paradigm should address the global alignment problem from a genuinely different perspective — not change the problem.
Candidates that clear that bar
Pair-HMMs — fuse alignment and the error model into a single likelihood layer. DADA2 currently bolts
pval.rsonto an NW alignment as two separate steps; a pair-HMM models error-vs-signal natively. Richer, but higher cost/complexity — assess whether it's justified for amplicon data.WFA (wavefront alignment) — changes the complexity regime to O(n·s) (sequence length × edit distance), exploiting the same "sequences are very similar" assumption DADA2 already relies on (k-mer pre-screen + banded DP). Strongest candidate by the "different perspective, same problem" criterion. Caveat: we need a variant that retains the positional traceback, not score/edit-distance only, to feed
al2subs.Scope
pval.rsneeds, and benchmark vs the current anti-diagonal auto-vectorized banded NW (align_vectorized_with_buf).